2015-07-15

Four Quick Links for July

For those who live in the Pacific Northwest – or may be heading in that direction soon – I'll be doing a few talks in Portland and Seattle next week:

Portland, Tue Jul 21, 13:30–17:00

The Apache Spark tutorial at OSCON presents a hands-on introduction to Spark, with deep-dives into important components, SparkR and Data Sources API. The half-day tutorial considers examples from several Huawei case studies – production Spark deployments at scale for Telco use cases. I’ll be teaching this along with Haichuan Wang, Jacky Li, and Vimal Das Kammath V from Huawei.

Also: my newly updated video, Introduction to Apache Spark, will be featured as the Video of the Week during OSCON. This features new updates for DataFrames.

Portland, Thu Jul 23, 10:40–11:20

Microservices, containers, and machine learning provides a deep-dive into a project called Exsto that we’re using to explore the structure and dynamics of open source developer communities. It incorporates natural language processing, graph algorithms, etc., and leverages DataFrames and GraphX in Spark. We’ll explore the Apache Spark developer community as a case study.

Seattle, Fri Jul 24, 18:30–21:30

Eleven Almost-Truisms About Data will be a keynote at a launch party for the new GalvanizeU program in Seattle. Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. Let’s consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries – especially for those who are just now beginning to study about the technologies, the processes, and the people involved.

Seattle, Sun Jul 26, 14:15–14:55

PyData Seattle: NLP and text analytics at scale with PySpark and notebooks at PyData Seattle will go into more detail about the PySpark components of the Exsto pipeline. I’m also super excited about the keynote by Lorena Barba. We’re leveraging Project Jupyter for O’Reilly Learning and I’m really looking forward to talking with lots of people who are working on Jupyter for Education.

See you in Portland and Seattle!