We’ve all heard the saying that a data scientist is a cross between a statistician, domain expert, and machine learning hacker, but in today’s landscape, that falls short. A good data scientist needs to be all of the above and also a great designer.
Great points made there. Thanks to a teaching fellowship as a grad student many years ago, I got to stick around a couple extra terms and take a
Design Communications program. That program evolved substantially, and of course there’s now
Stanford d-school too. I'm grateful for those experiences and (taking a cue from Kevin Dalias) perhaps some formal exposure to design helped shape my later career path. Most certainly we emphasize
design thinking at
The Data Guild among our core values.
This stereotype of the lone male computer whiz is self-perpetuating, and it keeps the computer field overwhelming male. Not only do hiring managers tend to favor male applicants, but women are less likely to pursue careers a field where feel they won’t fit in … as late as the 1960s many people perceived computer programming as a natural career choice for savvy young women.
|
O'Reilly Video Studio, Sebastopol – meerkat crossing |
Just Enough Math
Got to spend much of last week in
Sebastopol working with the remarkably talented folks at
O’Reilly Video.
Allen Day and I have been busy writing our new book,
Just Enough Math – and are now developing a video and much more to go along with that. Elevator pitch:
advanced math for business people, to understand how to leverage OSS frameworks for Big Data. We pick up from a prereqs of:
High School Algebra 2,
some Python, plus the experience in business to recognize why you need to leverage data. Let's explore how.
For example, you've probably heard much about
graph query engines and perhaps read about
graph use cases … how much
graph theory did you get exposed to in school? Given that so many people stop at calculus, graph theory is perhaps rare among b-school topics. Would you feel comfortable working from a use case – in the sense of an
HBR-styled business framework – leveraging large-scale graphs to build a high-ROI app? We think the answer is “Yes.”
Each morsel of advanced math gets introduced through a clear business use case, some historical context, lots of illustrations, and small snippets of Python code that you can cut&paste. In addition to integrating text + video
+ code, we are leveraging an instructional rubric called
Computational Thinking. Stay tuned! Meanwhile, we’re scheduled for a
Just Enough Math tutorial at
OSCON in Portland the week of July 20th.
Compelling Projects
While I’m out teaching workshops around the country, I enjoy many opportunities to meet amazing people and hear about their projects. I’d like to highlight in particular about Mike West in Austin, and his recent post
People Analytics Junto (public community):
We are a loosely connected group pressing forward, for the benefit of humanity, on the following topics: 1.) Exploring innovative ways data can be used to solve people related problems or make better people related decisions in organizations. 2.) Seeking understanding of organizations, and people in organizations, through Behavioral Science - Sociology, Labor Economics, Social Psychology, Psychology & Operations Research… 3.) Applying Big Data, Machine Learning, Artificial Intelligence, and related disciplines to Human Resources.
I am fascinated by use of Big Data and machine learning for HR. Having worked closely with HR in several organizations, having hired lots of people into Data Science and Engineering roles… I struggle to point to instances when we really leveraged data much – other than calculating salary targets for new hires or attrition rates.
The point is not to automate HR. Rather, the point is that most organizations spend most of the revenue on people (which makes sense) so why not invest in data insights there?
Do You Need or Want to become a Data Scientist?
KDnuggets recently ran Part 3 of 3 in my email Q&A
interview ... aimed at candid career advice to people wanting to move into Data Scientist roles. Arguably a bit over the top, and in reaction to being exasperated by "Read this and begin calling yourself a Data Scientist" puff pieces. Many thanks to Anmol, Gregory, et al., @KDnuggets. This came after
part 1 and
part 2 about Apache Mesos following Strata. I'd like to publish some snippets here – not about what I said, but about what people began discussing.
A comment by
Data Science London:
++1 “Product Mgmt. in SV is almost antithetically opposed to effective use of data” … Data Products nuke mgmt layers
“Actual work in Data Science entails having to speak truth to power (not fun, but the essence of the role)”
Several criticisms were much appreciated, and hopefully that helped spur more dialog...
Daniel Tunkelang:
@paix120 @BecomingDataSci I agree that @pacoid is overcompensating a bit to counter the proliferation of be-a-data-scientist-quick programs.
Data Science Renee:
.@dtunkelang @paix120 @pacoid hm could be. Just seems to take an unnecessarily discouraging tone.
Followed soon after by great
perspectives which are recommended reading.
Plus some wise words stated much more succinctly that I could...
Gregory Primosch:
a #datascientist is not a magical unicorn http://goo.gl/ytHzT4
Andrew Musselman:
@GPrimosch @pacoid I started saying instead of hiring unicorns you should hire horses and narwhals
All of that discourse was illuminating to see. Well said, much better than I did. And, as
Charlie Greenbacker pointed out:
It’s also a great rebuttal to all the articles claiming “data science” will soon be automated. #WishfulThinking
Indeed. Not to be overly flippant, but my hunch is that
Data Scientist roles will become
fully automated at about the same time as HR professionals and BoD meetings.
Meanwhile, IMHO some of the wisest words on this subject come from
Nick Kolegraff, Dir Data Science @Rackspace:
Do you need a data scientist?
Open Source Updates
Other recommended conferences with recent announcements:
|
Backyard in bloom – site of a new Google campus |
Upcoming Events
Lots of plans to be out on the road during April/May this year. I hope to get to talk with you there! Here’s a summary of upcoming meetups and workshops, including new material. We will have drinkups plus office hours in most of these cities – probably adding more meetup talks too:
- San Francisco
- Washington, DC
- Hands-on Intro to Data Science
Mon, Apr 14, 8:30am–4:30pm (Eastern)
MicroTek, 1101 Vermont Ave NW #700, Washington, DC 20005
- Hands-on Intro to Machine Learning
Tue, Apr 15, 8:30am–4:30pm (Eastern)
MicroTek, 1101 Vermont Ave NW #700, Washington, DC 20005
- Deep Dive on Apache Mesos
Tue, Apr 15, 6:30pm (Eastern)
AddThis, 1595 Spring Hill Rd #300, Vienna, VA 22182
- Austin
- Hands-on Intro to Machine Learning
Thu, Apr 24, 8:30am–4:30pm (Central)
AT&T Conf Center, 1900 University Ave, Austin, TX 78705
- Cluster Compute App Integrations
Fri, Apr 25, 8:30am–4:30pm (Central)
AT&T Conf Center, 1900 University Ave, Austin, TX 78705
- Atlanta
- Big Data Week ATL (keynote talk)
Sat, May 10, noon–4:00pm (Eastern)
GA Tech Research Institute Conf Center, 250 14th St NW, Atlanta, GA 30361
- Hands-on Intro to Data Science
Mon, May 12, 8:30am–4:30pm (Eastern)
MicroTek, Northpark Building 400 #194, 1000 Abernathy Rd, Atlanta, GA 30328
- Hands-on Intro to Machine Learning
Tue, May 13, 8:30am–4:30pm (Eastern)
MicroTek, Northpark Building 400 #194, 1000 Abernathy Rd, Atlanta, GA 30328
Misc. Inspiration
Speaking of Big Data apps, here’s a good one:
simulations show that in the context of hurricanes Sandy, Isaac, and Katrina, wind farms disrupt the outer rotation winds so much that the storms do not even have enough energy to destroy the turbines – let alone their damage
after landfall. That would effectively reduce category 5 storms to category 2 on a
Saffir–Simpson scale. Similarly, storm surge decreased substantially in simulations – up to 79% for Katrina.
Consider that estimates for protective seawalls run in the $10-40B range, per city installation… seems like reinsurance companies could start underwriting turbine farms to cut their costs massively, not to mention generating electricity. I've heard
Prof. Mark Jacobson present, and his work in general is highly recommended.
I'll leave you with one of the more interesting kinds of archival data that I’ve seen in a long while…
Years by Bartholomäus Traubeck, a record player that plays tree rings.
That's the update for now. See you in DC, Austin, Ann Arbor, and Atlanta, with Seattle and NYC on the event horizon!