2008-07-05

a methodology for cloud computing architecture - part 3

[Ed: the diagram shown below has been updated; see previous diagram and the discussion update.]

One more iteration on that diagram.

This time we'll represent some structured approaches running atop a Hadoop cluster. Parallel R being one that addresses our department's needs the most directly...


FWIW, I'm more interested in what can be accomplished with batch jobs and non-relational database approaches -- on Linux-based commodity hardware. Probably an occupational hazard for working with very large data sets and fast turn-around time requirements, where fault-tolerance seems to be what cripples very large-scale relational approaches, in practice. In other words, if your solution to handling very large data requires an SQL query, I might not take the call. At least not for analytics.

Having said that, we still have a ton of transaction / web-based front-end work, which was written for LAMP and still fits relational better.

No comments: