Newsletter Updates for June 2014

Been quite an interesting month: NYC, SJ/SF, bookended by Hadoop Summit and Spark Summit, with Foo Camp in the midst… much learned, and many excellent introductions.

If you haven’t seen it, this is a gem: Seeing Spaces by Bret Victor, as an evolution of the “Maker Spaces” concept. Another top recommend is A Short History of and Introduction to Deep Learning by John Kaufhold. Money quote: “Learn, don’t engineer feature representations.” Check this review by Mary Galvin at Data Community DC.

For another great source of inspired writings, follow the Matthew Hunt posts on LinkedIn . In this episode of delightfully unexpected connections, Matthew leads us on a path among Pink Floyd, moon cheese, gnome-like cretins, and unlikely heroes for a tale of two Burkes.

   Just Enough Math

The video for Just Enough Math has been on sale for the past month. O’Reilly has a preview video on YouTube, if you’d like to check out a sample. Meanwhile…

I need your help: this Just Enough Math project would greatly benefit from your reviews. Even if you don’t purchase the full video, check the preview and the free sections. We’re eager to hear your feedback, and especially your reviews!

Here’s the thing: on the one hand, if you’re the kind of person who enjoys reading math papers as a fond pastime, this material is probably not for you. There are plenty of other videos in the world, and so many brain teasers, so little time. On the other hand, if you find that math papers tend to be almost entirely devoid of context (which, frankly, many are) and you took math through Algebra 2, and you enjoy seeing some examples, learning some history, etc., then you’ll probably benefit from this video.

There are quite a number of great resources at O’Reilly and other publishers for those who want a deep-dive in any particular area of advanced math applied for Big Data … and the point of the Just Enough Math project is to serve almost like a “hyperlink document” (e.g., old school web pages circa early 1990s) for those other books, videos, websites, etc., along with providing history and case study examples as context.

We’ll be presenting a tutorial based on Just Enough Math at OSCON. Plus, there’s a super-secret discount code for 20% off registration: PACOID

In the Bay Area, we’ve recently launched a Just Enough Math: Machine Learning for Execs and Entrepreneurs meetup. Looking forward to more events through that. Submitted as evidence, check out “How Not to Be Wrong”: What the literary world can learn from math by Laura Miller in Salon.


Some interesting insights about Apache Mesos surfaced in the recent 2014 community survey. And at this point, the list of firms adopting Mesos no longer fits in my browser window. To find out more, check out MesosCon scheduled in August in Chicago. I’m looking forward to talks from John Wilkes and several other experts, and meanwhile will present about Apache Spark running on Mesos. In related news, recently I gave a talk at the Mesos NYC Meetup sponsored by the kind folks at Shutterstock. If you’re in the area check out an intro Mesos talk on 7/17 by Joe Stein at Bloomberg.

✽ ✽ ✽

On a recent camping adventure in Sebastopol, I was grateful to learn about lots of new technologies. One of the more interesting finds was Unbounded Robotics, and I enjoyed a chat with Melonee Wise, CEO. These actually are the droids you’re looking for. Meanwhile, O’Reilly Media is looking for editors, especially in the Data practice area. Got Edit? Join the team!

morning walk in Ceres Community Garden, w/ O'Reilly Media in bkgd

In terms of other interesting technologies… I’ve been hearing memes rumbling about “Big Data is a myth” or “Where are the IoT apps?” Here, that’s where. The part of Nokia that didn’t sell off to Microsoft is handling some of the most interesting fusion of data exhaust that I’ve seen. Case in point, check out Jams, game theory, and equations: the science of traffic for a view of really big data analyzed in real-time. If you’ve attempted to drive anywhere in, say, DC or Austin or Silicon Valley anytime recently during commute times … this is a problem. Money quote: “Then we start to look at the car’s sensors. We start to know the weather before the weather authorities do, because we can see which cars have their windscreen wipers or their headlights switched on.” Orders of magnitude larger than your favorite social network or ad exchange.

Meanwhile, my favorite IoT app so far is clearly this: sharks tweet as they approach the shore of Western Australia. Would be great to see more technology applications like that!

   Minecraft camp

Speaking of Foo and other camps, I’ve got two kiddos currently in iD Tech Camps --learning Minecraft and Scratch, respectively. These courses tour around the US and are highly recommended. We could learn much from their teaching approach, to benefit professional workshops for adults as well.

To follow-up on the Minecraft + Quantum theme from previous posts, here’s a good video of Seth Lloyd explaining Quantum Machine Learning. Why does this seem to call back to the Real Genius movie?

   Ag + Data

O’Reilly Strata recently carried a story about how Farm data could be worth billions, related to the Ag + Data post on O’Reilly Radar. Much is happening in Ag data and other consumers of remote sensing products – particularly with respect to recent changes in satellite regulations. However, my favorite recent Ag story is about the Purdue Improved Cowpea Storage (PICS) bag. Brilliant work.

Overall, much of the interesting Ag+Data tech seems to be coming from (or through) Chile… and a new phrase has emerged: Chilecon Valley.

   Friends in the News

Congratulations go out to Robby Garner, competing with the JFRED Chat Server in Turing2014: 60th anniversary year of Alan Turing’s untimely death. Many years ago, Robby and I worked on a primordial version of JFRED. That played “customer service agent” for the FringeWare online bookstore. Circa 1998 we ran the bots on BBC “Tomorrows World” for a live televised Turing Test, which is some of the  most fun I've ever had in network engineering. More recently, Hubot-based chatbots are being deployed for devops and other engineering teams, such as the Shep chatbot used by engineering at O’Reilly.

Also, check out the new Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives by Vijay Srinivas Agneeswaran. This is a deep-dive into design patterns and frameworks for large-scale analytics beyond Hadoop.

Got to meet lots of people interested in using Spark at the recent Hadoop Summit in San Jose. One of the Community Choice Awards at the conference went to “Demo: Building a Unified Data Pipeline in Apache Spark” by Aaron Davidson from Databricks. Eager to see the slides published for that. Also at Hadoop Summit, Xiangrui Meng gave an excellent talk about the MLlib – the tech roadmap and integrations, and especially emphasizing about how to leverage sparsity in your data.

Meanwhile, friends at Zementis have recently released PMML support for Python, with a project called Py2PMML. In particular, there’s integration for scikit-learn. I wonder how long before PySpark + MLlib joins that list?

   Joaquim on the Moon

As many of you know, given enough beers I become fond of talking about dropping large complex arrays of sophisticated equipment into the polar dark craters on the Moon. In recent convo over drinks with people who calculate the costs of such an operation, for a living we surfaced a interesting price tag for that kind of venture: approximately $15B. In terms of how much the US spends on the Department of Defense, that’s about 8 days’ worth. Think about it. Who wants a term sheet? Meanwhile, the subject got me thinking of Kubrick films, particularly the 2001: Space Odyssey set production, an engineering feat in itself.

That's the update for now. See you in PDX and ATX, with Chicago and San Diego on the event horizon!


Newsletter Updates for May 2014

Been quite an interesting past month or so: DC, Austin, SF, Ann Arbor, Atlanta, Seattle… with hopefully much learned from those travels, plus many excellent events and introductions.

Meanwhile, I learned much from this gem, Therbligs for data science: A nuts and bolts framework for accelerating data work, by Abe Gong. Looking forward to seeing more about Therbligs from Abe. Definitely tune in to Welcome to Intelligence Matters, a new series by O’Reilly exploring current issues in AI, with Beau Cronin as lead correspondent. Another recommended gem is Genomics Crash Course for Data Engineers by Allen Day – that's at the intersection of Genomics and Big Data, for which I have seen an uptick recently.

Just Enough Math

Allen and I have been working to complete our new O’Reilly book, Just Enough Math. The video is in post-production now, and the book is half through second drafts – we are closing in! Some of that material will be previewed in the upcoming workshop Machine Learning for Managers:
O’Reilly will host a free one-hour webcast, Computational Thinking, Just Enough Math on Wed, Jun 4, 10:00am–11:00am (Pacific). Please join me there. The webcast will help publicize a tutorial based on Just Enough Math at OSCON in Portland on Sun, 20 Jul, 9:00am-noon. As a special offer, use the code PACOID to get a 20% discount on OSCON registration. Our tutorial will preview a very new thing at O’Reilly: converting book+video content into interactive tutorials using Docker + IPython Notebook + Vagrant + Git for a cloud-based next-generation content platform.

Speaking of Docker, one of the more interesting start-ups that I have run across recently is Resin, using Docker and Git to containerize+push apps on IoT devices running embedded Linux. Brilliant work.

UCB Initiation Ritual: cousins circa 1968, near Atascadero

In other news, I am thrilled to announce a partnership with Databricks, where I’ve been working to help develop an instructional program that introduces Apache Spark. As you can see in the photo above, the ceremonial ritual for teaming up with UC Berkeley is a bit arduous, but well worth it. Yes, you heard correctly … a Stanford alum saying “Go Bears!”

Our first course in the series is Databricks Hands-on Intro to Apache Spark, an introduction for developers working in Python, Java, and Scala. We have several of these workshops scheduled:
Spark is approaching the 1.0 release at Apache, with new support for SQL. Overall, one of the best presentations that I’ve seen recently about it was Spark at Twitter by Sriram Krishnan, Engineering Manager for Data Platform at Twitter.

The agenda was posted recently for Spark Summit 2014, in SF on 30 Jun - 1 Jul. As another special offer, use the code Paco2014 to get a 15% discount on Spark Summit registration. Highly recommended, and I hope to see you there.

Mesos Updates

Speaking of BDAS and the Berkeley Stack… there have been lots of developments in the Apache Mesos world. One of the best talks ever about Mesos was Improving Resource Efficiency with Apache Mesos by Christina Delimitrou, a case study about Quasar usage at Twitter. Also check out Mesos Elastically Scalable Operations, Simplified by Niklas Nielsen and Adam Bordelon, presented recently at ApacheCon 2014.

The other big news is that #MesosCon, the first Mesos conference, will be held in Chicago on Aug 21. Definitely see you there! Companies interested in sponsoring the conference – please inquire.
I’ve create a new workshop called Cluster Compute App Integrations about building end-to-end apps for Big Data. The workshop leverages Mesos based on the https://elastic.mesosphere.io/ service in the cloud, along with Spark, KNIME, etc. Hint: this involves teams competing, and it is turning out to be quite a popular course. We have upcoming dates lined up:

Agriculture + Data

Did you know that agriculture provides a livelihood for 40% of the world’s population? Or that agriculture consumes 70% of the world’s freshwater in aggregate? That figure is expected to reach 89% by 2050. Or have you heard that Havana grows 75% of its own food based on urban agriculture?
Last month I wrote an O’Reilly Strata article, Ag+Data, about those topics and more. The article introduces a whitepaper, Agriculture + Data: Outlook 2Q14, that we recently at The Data Guild to explore these issues in greater depth. Many thanks to Bill Worzel, Brad Martin, and others who helped on that!

Evolutionary Algorithms

Recently I gave a keynote talk at the Genetic Programming in Theory and Practice conference, which hosted each year at U Michigan by The Center for the Study of Complex Systems. They are the experts in GP; I was merely there to add a few perspectives about machine learning and Big Data. What a wonderful conference. Got to speak at length with Lee Spector at UMass Amherst and Hampshire College. Lee and his grad students have been working with a Clojure-based language called Push, in which evolutionary programs are expressed.

What kinds of optimization problems respond to evolutionary pressure? Definitely not the kinds that one typically finds solved by machine learning. That is where GP approaches come in. In general, there was a lot of discussion about symbolic regression as a general rubric, also some exceptionally interesting work on use of Pareto optimal fronts for model archives (which I’ll be added to my ML bag o’ tricks). In particular, great work from Theresa Kotanchek and Mark Kotanchek at Evolved Analytics. Their software effectively leverages Pareto optimality to select exemplars when models diverge, which I find to be a fascinating alternative to what other disciplines might attempt to resolve through sample. Brilliant work.

Also got to talk with Bill Tozier, author of Answer Factories: The Engineering of Useful Surprises, and viewed some astounding work in HeuristicLab, an interactive framework from HEAL. Think: evolutionary IDE. Another excellent tip was to check out Modeling global temperature changes with genetic programming by Karolina Stanislawska, Krzysztof Krawiec, Zbigniew Kundzewicz.

✽ ✽ ✽

Didn’t get to mention yet about Atlanta, but I really appreciated meeting many wonderful folks there. You’ll be hearing more about upcoming Atlanta plans soon! Also, there are workshops and meetup talks planned now for: NYC, SV/SF, Austin, Chicago. Next up after my current week in Seattle comes Hadoop Summit, on 3–5 Jun in San Jose. Hope to see you there!

-alaVoid Distribution

Misc. Inspiration

In closing… Those who have known me for, well, for the past 20-odd years or so will be familiar with the following: a 21st century artist named William Barker, formerly acclaimed of Schwa Corporation has a new endeavor called -AlaVoid Distribution. Definitely check out his new shop on Etsy.


Connected Devices Fellowship - O'Reilly Solid conf

I'm advising Amplify Partners and they've launched a Connected Devices Fellowship that includes conference registration, airfare, and accommodations to attend the new O’Reilly Solid conference on May 21-22 in SF.

The fellowship is designed for engineers, students, researchers, et al., who are passionate about infrastructure for IoT and connected devices:

This is an amazing new conference coming up, and an excellent opportunity. If you know anyone who'd be interested, please pass it along! Deadline is coming up quickly, applications are due April 11.