For another great article, check out
Including Men in the Conversation About Women by
Scarlett Sieber. Among my biggest peeves about Silicon Valley are the “brogrammer” lopsided demographics, and the
gender bias which is quite real and nearly epidemic. Our data science teams have generally been quite mixed, why can’t engineering teams in general leave the 19th century behind, let alone stop being so hostile? Not naming names, but two of the SV firms in which I’ve worked in the past five years are both well known and well poised for harassment lawsuits. Taking a stand against that nonsense as an engineering manager is a great way to catch hell, which I’ve gladly engaged before. Another related pet peeve is where one of the same firms was actively pressuring their engineering interns to quit university degree programs. As a behavior for an engineering manager, I find that highly unethical. Some of those who are engaged in these practices know quite well who I’m talking about.
Spark Summit
The big, BIG news last month was … (wait for it) …
Spark Summit. All of the
speaker videos have been posted – those are probably the single-best resource for learning about
Apache Spark. Of course, the big surprise at the conf was the announcement of
Databricks Cloud. If you missed the conf, you can watch Ali Ghodsi’s
spectacular demo which kicks in at about the 14:40 time marker.
|
Spark Summit keynote practice, T-15 hours |
One surprise learning from the conf was that
one product line from SAP generates more annual revenue than all of the other Big Data vendors (HW, Cloudera, etc.) combined. Other pleasant surprises included:
Flambo, a Clojure DSL for Spark; and
Thunder, for large-scale neural data analysis, which shows some excellent integration of PySpark, SciPy, scikit-learn, etc.
Our
training sessions at Spark Summit set some kind of new records. In particular, check out the advanced material for great lectures there. Those who attended the conf received a free ebook preview for the upcoming
Learning Spark: Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia; O’Reilly Media (2014).
OSCON
|
Blood Orange IPA |
During the conf,
Andy Orem did a video interview where we discussed perspectives and current projects: Ag+Data, Industrial Internet, sketch algorithms, Apache Spark, etc. Andy was the very first editor I worked with at O’Reilly Media,
ten years ago. He’s a much better interviewer than I am an interviewee, so I enjoyed learning much through our work together. Also fun to work again with the amazing video team.
|
"With great power comes some data, plus wrinkled shirts" |
The tutorial for
Just Enough Math had 50+ people attending, and we got to evaluate an intermediate stage of a new tutorial software platform. For that, I needed to get a bunch of USB drives from Amazon, but the order/delivery
#failed. At the last minute our 10 y.o. daughter and I made an emergency run to Fry’s Electronics (she was eager to observe ground zero for nerdliness) … but the only 4Gb flash drives that they had left in stock were
Marvel Universe comix characters. Arriving back home, our 9 y.o. daughter was aghast that adults would be receiving comix figures in a lecture :)
The
Data Workflows for Machine Learning talk received lots of great responses – as did earlier versions during meetups in Seattle and SF. It become of the “top-shared” slide decks featured on the SlideShare home page. Perhaps that needs to be turned into a mini-book?
|
new book kiosk |
As my last-o’-the-day book signing was winding down, after almost everyone had left the convention center for “nearby locations of beer taps”, a friend mentioned “Hey, look there’s another pile of books – these look different.” So a few lucky latecomers got signed copies of the galley drafts for our new book Just Enough Math, which probably still won’t be released for months – this rev is quite rough :) Oddly enough, the first person to read it looked up and said, “Where are the other O’Reilly books about math?” Indeed.
Sketchy Things
Speaking of
Just Enough Math, we’ve put up a companion site for the video+book+tutorial at
http://justenoughmath.com/ to provide additional resources and related links:
- set up a Python programming envon your laptop
- code+data files for examples in the video+book
- “gists” that show expected results for the examples
- links to external resources that get referenced
- recommended books and videos for further study
- monthly newsletter sign-up
The tutorial at OSCON previewed a new chapter recently added about sketch algorithms, following from notes at an excellent Foo Camp session led by
Avi Bryant. I will be focussing on
Spark Streaming use cases for Strata EU in Barcelona this fall, particularly where approximation techniques (
think: examples of monoids in action) can leverage both
Spark and
Cassandra. If you have examples to share of Spark Streaming production use cases in general, I’m eager to build case studies to publish in
Radar. Meanwhile, for a great resource about sketch algorithms, check out the archives of the
AK Data Science Summit – Streaming and Sketching from last summer.
Card-Carrying Green
A friend recently brought up the topic of navigating questions about extinction and climate change for preschoolers… I’m getting those too; however, in my experience the questions become much better formulated after an additional 5–6 years or so. As a parent, as a human, it kills me to see all the ginormous FUD spewing from the political lobbies for the coal industry, fracking, Monsanto, GM, etc. How about giving ample air time and consideration for some points from the other side?
First off, I’ve mentioned it before but it bears repeating:
The Land Institute is a phenomenally excellent resource for understanding some of the insanity and pure tragedy of contemporary agricultural practices, particularly when it comes to monocultures, annuals, hybrids, let alone unnecessary tillage. To paraphrase Wes Jackson, “The plow share has destroyed more options for future generations than the sword.” On a related note, I’ll also point to an
excellent article by Michael Pollan, as a forward to
Grass, Soil, Hope: A Journey through Carbon Country by Courtney White. Moreover, check out
The Solutions Project. That latter site has more substance than perhaps its web-design polish indicates: it’s about the work by
Mark Jacobson, et al., on how to power the planet via renewables now while mitigating hurricane damages, etc. One would think that the reinsurance revenues alone would justify a significant investment. In any case, these three links point to the fact that any emerging “dialog of despair” about global warming, etc., is purely FUD. Much can and will be done.
|
Phylo, the trading card game |
I’m particularly grateful to be associated with O’Reilly Media, which provided OSCON attendees with a nice treat in their schwag bags:
Phylo, a trading card game. Its gameplay emphasizes endangered species, climate change, food chains, and other environmental pressures. “Phylo is a project that began as a reaction to the following nugget of information: Kids know more about Pokemon creatures than they do about real creatures. We think there’s something wrong with that. Apparently, so do many others.”
In a related development, check out
Nerds Without Borders: “We are looking for all sorts of people to help: Engineers, Scientists, Writers, Artists, Dreamers, Activists, Organizers, Fundraisers, Financiers, etc…” Starting with use of IoT sensors and cell phone networks to
protect sea turtle hatchlings. Good stuff.
Looking Ahead
Another fun follow-up from Foo Camp and OSCON: getting to talk with
Scott Jenson about his work on
The Physical Web at Google. Check out his preso,
Why Mobile Apps Must Die. The big idea is a kind of “micro-DNS” for low-cost digital tagging of physical items that can be accessed by mobile devices. No app installs required.
In other news,
Trafodion was recently released as open source by HP. The name is based on the Welsh word for “transaction”. If you recall about
Tandem Computers and
NonStop, this product line has a long history of tech innovations – for highly reliable, highly optimized real-time SQL at scale. My uncle retired from Tandem, and lately I’ve spent time with the Trafodion team and am quite impressed. This release brings an interesting new level of Enterprise robustness to real-time transactions+analysis atop Linux+Hadoop. One to watch.
Another to watch closely is
The Distributed Developer Stack Field Guide by Andrew Odewahn, Courtney Nash, Mike Loukides, et al. This is a GitHub-based book from O’Reilly. If you see any points in there that need editing, embellishing, etc., then two words: pull request, for the win.
Flashbacks
I’ll close with a look back to a
1990 Documentary about Cyberpunk. That provides a good summary of what we up to in the early 1990s with
Mondo 2000,
bOING-bOING,
FringeWare,
WiReD, The WELL, Turkey City, etc. Tim’s monologue around 15:30-ff is hilarious – both because of his ever-optimistic “There will be mass democracy in the streets” miss, and how much it contrasts with just about every other major point coming true within 25 years. Warning: gratuitous
F242 clips, throughout. Time marker 27:11 shows what I was doing as a
vendor at many, many raves… Meanwhile, check out a recent
bOING-bOING article
Alien Autopsy: William Barker on Schwa, two decades later for some of the more astute counterpoint about what was really going on, then and now.
That's the update for now. See you in Chicago with San Diego on the event horizon!