Newsletter Updates for June 2014

Been quite an interesting month: NYC, SJ/SF, bookended by Hadoop Summit and Spark Summit, with Foo Camp in the midst… much learned, and many excellent introductions.

If you haven’t seen it, this is a gem: Seeing Spaces by Bret Victor, as an evolution of the “Maker Spaces” concept. Another top recommend is A Short History of and Introduction to Deep Learning by John Kaufhold. Money quote: “Learn, don’t engineer feature representations.” Check this review by Mary Galvin at Data Community DC.

For another great source of inspired writings, follow the Matthew Hunt posts on LinkedIn . In this episode of delightfully unexpected connections, Matthew leads us on a path among Pink Floyd, moon cheese, gnome-like cretins, and unlikely heroes for a tale of two Burkes.

   Just Enough Math

The video for Just Enough Math has been on sale for the past month. O’Reilly has a preview video on YouTube, if you’d like to check out a sample. Meanwhile…

I need your help: this Just Enough Math project would greatly benefit from your reviews. Even if you don’t purchase the full video, check the preview and the free sections. We’re eager to hear your feedback, and especially your reviews!

Here’s the thing: on the one hand, if you’re the kind of person who enjoys reading math papers as a fond pastime, this material is probably not for you. There are plenty of other videos in the world, and so many brain teasers, so little time. On the other hand, if you find that math papers tend to be almost entirely devoid of context (which, frankly, many are) and you took math through Algebra 2, and you enjoy seeing some examples, learning some history, etc., then you’ll probably benefit from this video.

There are quite a number of great resources at O’Reilly and other publishers for those who want a deep-dive in any particular area of advanced math applied for Big Data … and the point of the Just Enough Math project is to serve almost like a “hyperlink document” (e.g., old school web pages circa early 1990s) for those other books, videos, websites, etc., along with providing history and case study examples as context.

We’ll be presenting a tutorial based on Just Enough Math at OSCON. Plus, there’s a super-secret discount code for 20% off registration: PACOID

In the Bay Area, we’ve recently launched a Just Enough Math: Machine Learning for Execs and Entrepreneurs meetup. Looking forward to more events through that. Submitted as evidence, check out “How Not to Be Wrong”: What the literary world can learn from math by Laura Miller in Salon.


Some interesting insights about Apache Mesos surfaced in the recent 2014 community survey. And at this point, the list of firms adopting Mesos no longer fits in my browser window. To find out more, check out MesosCon scheduled in August in Chicago. I’m looking forward to talks from John Wilkes and several other experts, and meanwhile will present about Apache Spark running on Mesos. In related news, recently I gave a talk at the Mesos NYC Meetup sponsored by the kind folks at Shutterstock. If you’re in the area check out an intro Mesos talk on 7/17 by Joe Stein at Bloomberg.

✽ ✽ ✽

On a recent camping adventure in Sebastopol, I was grateful to learn about lots of new technologies. One of the more interesting finds was Unbounded Robotics, and I enjoyed a chat with Melonee Wise, CEO. These actually are the droids you’re looking for. Meanwhile, O’Reilly Media is looking for editors, especially in the Data practice area. Got Edit? Join the team!

morning walk in Ceres Community Garden, w/ O'Reilly Media in bkgd

In terms of other interesting technologies… I’ve been hearing memes rumbling about “Big Data is a myth” or “Where are the IoT apps?” Here, that’s where. The part of Nokia that didn’t sell off to Microsoft is handling some of the most interesting fusion of data exhaust that I’ve seen. Case in point, check out Jams, game theory, and equations: the science of traffic for a view of really big data analyzed in real-time. If you’ve attempted to drive anywhere in, say, DC or Austin or Silicon Valley anytime recently during commute times … this is a problem. Money quote: “Then we start to look at the car’s sensors. We start to know the weather before the weather authorities do, because we can see which cars have their windscreen wipers or their headlights switched on.” Orders of magnitude larger than your favorite social network or ad exchange.

Meanwhile, my favorite IoT app so far is clearly this: sharks tweet as they approach the shore of Western Australia. Would be great to see more technology applications like that!

   Minecraft camp

Speaking of Foo and other camps, I’ve got two kiddos currently in iD Tech Camps --learning Minecraft and Scratch, respectively. These courses tour around the US and are highly recommended. We could learn much from their teaching approach, to benefit professional workshops for adults as well.

To follow-up on the Minecraft + Quantum theme from previous posts, here’s a good video of Seth Lloyd explaining Quantum Machine Learning. Why does this seem to call back to the Real Genius movie?

   Ag + Data

O’Reilly Strata recently carried a story about how Farm data could be worth billions, related to the Ag + Data post on O’Reilly Radar. Much is happening in Ag data and other consumers of remote sensing products – particularly with respect to recent changes in satellite regulations. However, my favorite recent Ag story is about the Purdue Improved Cowpea Storage (PICS) bag. Brilliant work.

Overall, much of the interesting Ag+Data tech seems to be coming from (or through) Chile… and a new phrase has emerged: Chilecon Valley.

   Friends in the News

Congratulations go out to Robby Garner, competing with the JFRED Chat Server in Turing2014: 60th anniversary year of Alan Turing’s untimely death. Many years ago, Robby and I worked on a primordial version of JFRED. That played “customer service agent” for the FringeWare online bookstore. Circa 1998 we ran the bots on BBC “Tomorrows World” for a live televised Turing Test, which is some of the  most fun I've ever had in network engineering. More recently, Hubot-based chatbots are being deployed for devops and other engineering teams, such as the Shep chatbot used by engineering at O’Reilly.

Also, check out the new Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives by Vijay Srinivas Agneeswaran. This is a deep-dive into design patterns and frameworks for large-scale analytics beyond Hadoop.

Got to meet lots of people interested in using Spark at the recent Hadoop Summit in San Jose. One of the Community Choice Awards at the conference went to “Demo: Building a Unified Data Pipeline in Apache Spark” by Aaron Davidson from Databricks. Eager to see the slides published for that. Also at Hadoop Summit, Xiangrui Meng gave an excellent talk about the MLlib – the tech roadmap and integrations, and especially emphasizing about how to leverage sparsity in your data.

Meanwhile, friends at Zementis have recently released PMML support for Python, with a project called Py2PMML. In particular, there’s integration for scikit-learn. I wonder how long before PySpark + MLlib joins that list?

   Joaquim on the Moon

As many of you know, given enough beers I become fond of talking about dropping large complex arrays of sophisticated equipment into the polar dark craters on the Moon. In recent convo over drinks with people who calculate the costs of such an operation, for a living we surfaced a interesting price tag for that kind of venture: approximately $15B. In terms of how much the US spends on the Department of Defense, that’s about 8 days’ worth. Think about it. Who wants a term sheet? Meanwhile, the subject got me thinking of Kubrick films, particularly the 2001: Space Odyssey set production, an engineering feat in itself.

That's the update for now. See you in PDX and ATX, with Chicago and San Diego on the event horizon!