2006-11-05

aws rocks the free world

Amazon Web Services (AWS) is perhaps one of the most fundamental and far-reaching changes in the computer industry ... since perhaps when, recently, SecondLife hit the cover of BusinessWeek magazine. In fact, this news will be the cover story of BusinessWeek next week.

I knew about a friend working on this project a couple of years ago, and helped a little (on the sly) with documenation. So when it came time to provide offsite backup services for HeadCase, we thought it might be good to check out AWS. Price-performance comparisons showed 90% reduction in cost over going to an ISP or other backup service, plus the Amazon approach ("S3") had better reliability and other uses, such as tiered storage.

Overall their "cloud" of grid/utility computing services allows a small technology startup to begin playing with REALLY big scale services - it fits almost perfectly for those of us who share a perspective of REST + SOA. With "S3" as a storage grid, "EC2" as a compute grid (Linux images, Java API), "SQS" as a transaction message queue (somewhat reminiscent of IBM's MQ), and MTurk for the "human computation" and crowdsourcing, this business strategy precipitates a fundamental shift in how to plan for IT infrastructure, how to manage QA resources, etc.

As an engineering manager, I would expect to pay a lot to get access to that kind service - especially giving Amazon's remarkable quality of service. Instead, I pay pennies on the dollar compared with a hosted service - and moreover I don't have to staff up my operations, since Amazon handles that as "outsourcing". And thereby our firm inherits that quality of service, since the Amazon AWS services handle the public-facing aspects of our SOA.

The bottom line here: the fundamental issue with cost of scaling Internet infrastructure is not the processors (thank you very much, Dell or HP) but instead - as IBM knows oh so well - the utilities involved, such as power. Amazon is building out data centers in the Pacific Northwest, located near cheap, plentiful hydroelectric power generation. This has a significant Green effect, optimizing the electrical power generation and usage by collocation, then exporting the "refined" use as data center services.

This almost as major change in the industry as when DNS was invented, or HTTP/1.0 became accepted, or Skype launched. Wall Street is tending to comment that Amazon is not following a core strategy, without clearly understanding that Amazon has just put several big players on notice for substantive business model disruptions: notably IBM, Microsoft, Google, EMC, HP, BMC, (and for what it's possibly still worth) Sun. As the article in BusinessWeek does mention, this is clearly a good strategy for launching a tech startup with vastly reduced capital and substantially enhanced quality and scalability. In other words, in the eyes of VCs, that implies better opportunities to leverage capital.

Here is an example pattern of usage:

  1. Design your data model to be stored in the Amazon S3 storage grid. For example, we have requirements for running a Java JSR 170 content management system, which fits well with S3 capabilitities.
  2. Prepare images for your application servers (Java, PHP, whatever) to run on Amazon EC2.
  3. Point the client side of your applications, such as Ajax requests, at the Amazon SQS message queue.
  4. Allocate your server images in the EC2 cloud to pull requests off your SQS queue.
  5. Your middle tier processes requests within EC2, persisting data out to your S3 storage.

You can also run "back office" tasks such as reporting or data mining based on the same pattern - without disrupting customer services.

I've read about people beginning to use this kind of pattern to setup their QA environment for regession testing or load testing - again without disrupting other operations or requiring costly server + network replication.

It would seem to fit quiet well in, say, with Java Server Faces used for an Ajax UI... based on a tech stack that used Seam, clustered JBoss servers, Hibernate for the persistence layer, and clustered MySQL underneath that.

1 comment:

Paco Nathan said...

ixnay on the SQS, at least for now. we gave it a good look, poked around a bit, and it needs a bit more work.

but EC2 and S3 are awesome.