2008-07-08

a methodology for cloud computing architecture - part 5

Diagram, redux.


Errata: fixed issues regarding 3Tera and CohesiveFT, both of which deserved better positioning in the stack than my cloudy thoughts had heretofore imagined.

Again, many thanks to vendors who are contacting me, and kudos in general to discussions on the "cloud-computing" group.

One thing jumps out from this diagram now: items 6 and 12 are placeholders. What vendors or projects might emerge to cover them? For example, might some enterprising soul rewrite portions of the Scalr project so that it can handle APIs for other vendors?

2008-07-07

a methodology for cloud computing architecture - part 4

[Ed: the diagram shown below has been updated; see previous diagram and the discussion update.]

The diagram won't go away -


This time around, I'd like to take a few notes from both the Globus VWS and the GoGrid / Flexiscale discussions on the "cloud-computing" email list...

Suppose you just happened to have a bunch of hardware, and wanted to leverage that as a grid – in other words, as "disposable infrastructure" to use the phrasing from 3Tera, or perhaps "managed virtualization". From what I've read in the respective web sites, there are a few ways to approach that. 3Tera has AppLogic, and also there is the Eucalyptus project which exposes a layer looking like EC2.

At a bit higher level of abstraction, there are offerings from CohesiveFT and Enomalism which provide similar "grid" atop your metal, but also manage your images for your cloud-based apps on EC2. In other words, those allow for portability of cloud-based apps – whether on your own metal or on metal leased from AWS.

Slicing this stack diagonally in another direction, there are RightScale and Scalr which manage scaling your cloud-based app on EC2 – and monitor it. The former in particular provides excellent features for scripting and monitoring apps. Flexiscale provides similar features for apps run on their cloud (in UK – hope I'm getting those facts straight).

Traveling up the stack on the batch side, Globus provides open source software to run jobs on your own metal, or on EC2 – or on scientific grids, if you qualify.

Meanwhile, GigaSpaces has announced joint projects with both RightScale (for EC2) and CohesiveFT (for your own metal + EC2) to run spaces for SBA – in other words, virtualized middleware. That appears to take the integration of cloud portability and cloud management to another level.

The diagram still needs work along the Hadoop / HDFS portion of the stack – which was its original intention. Obviously there is no distributed file system when talking about Globus, Condor, etc.

Overall, it's quite nice to see this evolution among vendors and product offerings. Many thanks to the vendors who've been taking me on guided tours of their wares.

2008-07-05

a methodology for cloud computing architecture - part 3

[Ed: the diagram shown below has been updated; see previous diagram and the discussion update.]

One more iteration on that diagram.

This time we'll represent some structured approaches running atop a Hadoop cluster. Parallel R being one that addresses our department's needs the most directly...


FWIW, I'm more interested in what can be accomplished with batch jobs and non-relational database approaches -- on Linux-based commodity hardware. Probably an occupational hazard for working with very large data sets and fast turn-around time requirements, where fault-tolerance seems to be what cripples very large-scale relational approaches, in practice. In other words, if your solution to handling very large data requires an SQL query, I might not take the call. At least not for analytics.

Having said that, we still have a ton of transaction / web-based front-end work, which was written for LAMP and still fits relational better.