Cascading
I've got a series of blog posts going as a "gentle introduction" to Cascading, over at "Cascading for the Impatient". This includes sample code in a GitHub repo, and progresses from a distributed file copy, to yet-another Word Count, to a full TF-IDF implementation in MapReduce, complete with TDD features.
Related to this, see also some recent content:
Related to this, see also some recent content:
- "Intro to Data Science for Enterprise Big Data" (which made the front page of SlideShare)
- "Cascading for the Impatient" lightening talk slides
- "Multitool" (Bash scripting for MapReduce -- recently updated)
- "Sample Recommender" for stock picks in Twitter tweets
- "City of Palo Alto Open Data"
I will also be speaking at some upcoming conferences, all listed on my Lanyrd profile:
- Splunk conference in Las Vegas, 9/13
- Cloud Con Expo in SF, 10/3
- ACM Data Mining, 10/13
- SpringOne in DC, 10/17
- DC Data Science, 10/17
...and for more info about Cascading, please join us at the new Cascading Meetup group.