Outlook for Azure – scattered clouds but generally sunny
October 14, 2009
That is my forecast, anyway. Over the last year, I have had the opportunity to talk to a number of customers, user group and conference attendees about Cloud Computing in general and specifically about the Windows Azure Platform. During these conversations, I have come across a number of concerns/ questions about Azure pricing, performance, security and so on. Clearly Windows Azure has a long way to go before it becomes a mature cloud platform. However, some broad-brush statements questioning Azure’s pricing, performance and applicability that I have come across lately, deserve some clarifications. In this, post I have attempted to capture some of these concerns and provide my humble thoughts on why we need a take a holistic approach when evaluating the Windows Azure Platform.
Azure – A Platform as a Service offering
Let me begin by stating that not all cloud offerings are alike. In their paper, “Toward a Unified Ontology of Cloud Computing”  Lamia Youssef et al. provide a detailed model for understanding the different classes of cloud providers. The following diagram depicts their proposed classification of cloud service providers including Platform as a Service (PaaS), Infrastructure as a Service(IaaS) and Software as a Service (SaaS).
Screen clipping from: http://www.cs.ucsb.edu/~lyouseff/CCOntology/CloudOntologyPres.pdf
Microsoft Azure is a “Platform as a Service” offering, which is very different from what Amazon is offering via EC2 (commonly classified as “Infrastructure as a Service”) or even Google’s App Engine (GAE), commonly classified as a specialized “Platform as a Service” offering . As a result of this difference in approach, the lifecycle of a Windows Azure application lifecycle is different as well. As depicted in the figure below – A developer develops the code in the developer fabric running locally and then simply publishes it to the cloud, along with a description of the desired infrastructure model. Based on the uploaded code and the infrastructure model, Azure orchestrates the provisioning of the computing, storage and networking resources in a manner that is fault tolerant and allows for rolling upgrades. It is clear that Microsoft has had to innovate at 100 mph in order to make this concept a reality. I encourage you to read a recently released Microsoft research paper on Helios that describes an OS abstraction across a number of heterogeneous CPUs. Make no mistake; Azure is the only cloud offering that is trying to create a “Cloud OS” abstraction around a fabric of Windows machines. This is not to suggest that EC2 or GAE offerings are not useful. Far from it – If one is looking for an on-demand Virtual Machine provider, EC2 is a great choice. After all, Amazon is the king of retail and as its executives like to say with a grin “we will gladly match the lowest price offered by our competitors”. On the other hand, if you have a Java or Python based web application that can fit GAE pre-defined application structure and framework – GAE would be a great choice. GAE is already successful with over 45,000 applications by the last count. In the end, different requirements will require the use of different types of cloud offerings.
1) Windows Azure manages applications not just servers
2) Tell it what you want, and it will automate the details
3) Model-driven automation
4) Platform insures service isolation
Now that we have an understanding of the different flavors of cloud computing, let us circle back to where I started and address some of the broad-brush statements, I alluded to earlier.
Concern #1: “The Car Analogy – Azure Pricing model is fundamentally broken”
“Azure is like a car that has its engine running constantly, even when it is not being driven “.
The author of the above analogy is trying to contrast Azure pricing with GAE where you only pay for the time you use the service.
“Azure is like a car that has its hood sealed shut. One cannot look what is under the hood, leave alone reconfigure it“.
The author of above analogy is trying to highlight the fact that one cannot directly access the virtualized instance that is running the Azure based application.
I think each of the above analogy only conveys a half-truth about Azure. I think a better analogy for Azure is this:
“Azure is a like a car you rent from ZipCar (a hourly car rental company)”
The price of the ZipCar rental includes gas and insurance cost. Of course, you pay for the hour whether you use the car or not (ZipCar has to spend money to honor a reservation you have made). But at any time, one can simply return the car and incur no additional charges. Would someone care to look under the hood of a ZipCar – I certainly would not. This is against the very reason I want to get a car from ZipCar in the first place – once I pay for the hourly rental I don’t have to worry about insurance or fuel. If something goes wrong, I can simply wave my ZipCar card and get into the next available car. And if for any reasons, my co-passengers want to go on a little excursion – they can, on a short notice, get additional cars at the same fixed price and give them up when they are done with them.
So how can GAE offer a model that requires the customers to pay only for the time their application gets used, when EC2 and Azure cannot? The answer lies in the figure below – as you want more control the economy of scale goes down. Azure and EC2 give you more control (in that order) and as a result, cannot take advantage of the economy of scale at the same level as the GAE. GAE, Azure and EC2 are each offering a multitenant platform but the degree to which each the tenant can configure their individual setup is different. I will not be surprised if an ISV were to come out with an Azure based cloud offering that is constrained in some way (for example, it allow customers to submit their HttpHandlers ) but in turn, offers a usage based pricing model.
There are two other pricing related concerns that I have come across.
“I can host 2 blogs and a wiki for the price of a single Azure Web role“.
Hosting a blog or wiki is not likely the target application for Azure Web Role. If you want to host a blog or wiki, Business Productivity Online Suite (BPOS)  offering will be a more cost effective.
“My hoster allows me to create multiple virtual directories (vdir) inside a single VM. With Azure, I need multiple Web Roles to achieve the same setup“.
First off, it is not appropriate to compare a vdir offered by a VM where you have completed access to the machine (and in turn assume all the responsibility for administering it), with the vdir offered by a web role – web role offers fault tolerance, monitoring, load balancing and other SLA guarantees. While these differences are important, in the end, it is really about raising the level of abstraction. Readers will recall that when IIS 6 came out with the default out-of-process model, there were some concerns that provisioning a distinct process for each web application would be expensive (as opposed to an in-process or pooled process model supported by IIS 5.0). But the isolation and fault tolerance offered by an out of process model quickly sidelined any concerns about the cost of provisioning additional processes.
Concern #2: “Azure does not scale dynamically”
This concern is based on the fact that one has to specify the exact number of Azure role instances that one wishes to provision. In the current CTP, there is no way to specify a range for the role instance. In other words, letting Azure determine when it is appropriate to ramp up or ramp down the number of running instances, automatically based on the load on the application.
Frankly, this is not as big a deal as it sounds. One can always write a small piece of code that spawns new instances (or conversely, shuts down unused ones) based on the health alerts. Also note that the underlying fabric does indeed support providing a range (refer to the PDC 08 session  “Under the Hood: Inside the Windows Azure Hosting” – fast forward to minute 67:14). I think that it is a matter of time before Microsoft exposes this capability to the end users.
Concern #3: “Azure makes sense only for the large enterprises”
“The massive scalability, fault tolerance and high availability are well suited for large enterprises. Most of the applications we build as a small/medium business don’t need these capabilities”.
Let us consider a small/medium sized web retailer. Based on the available, Azure pricing it would typically cost ~$360/month to run (refer to the cost breakup provided below) a small /medium web site – all without any upfront investment in staging / production / disaster recovery hardware or software costs.
|Compute||2 web roles to get the 99.95% uptime SLA||$172.80 (30*24*.12*2)|
|Storage (SQL)||10 GB||$99.99|
|Storage Transactions||1 Million Transactions||$1|
|Messages||1 Million Transactions||$1.50|
|Bandwidth In||50 GB||$10|
|Bandwidth Out||100 GB||$75|
|Total||$360.29 / month|
To compare the above cost with a hosted/on-premise solution, I checked with one of our customers on how much they were paying to host their servers in a data center in Herndon, VA. Here is the information I was provided:
|Resource||Single windows server machine|
|What is covered||Network, Power, Cabling|
|Tape Backup, Config and handling|
|Firewall, F-5 Network switch|
|Remote hands support|
|Physical security and intrusion detection|
|AD Servers / Exchange/ SMTP Access|
|What is not covered||Applying patches, failure detection.|
|Cost of the hardware and software|
Furthermore, even small retailers are subject to compliance with the security and auditing standards such as Payment Card Industry – Data Security Standard (PCI-DSS) and Statement on Auditing Standards (SAS 70). By hosting their application in the Microsoft’s cloud infrastructure, even small retailers can take advantage of security certifications that Microsoft has earned including SAS 70 and ISO/IEC 27001-2005 certifications .
Let us consider another example – I am working with a customer who has a very limited budget for hosting a small HR application in Canada, because of the local laws, this application must be hosted within geographical boundaries of Canada – unfortunately they have a very small office in Canada with limited or no data center capabilities. At first glance, the option to purchase a small Windows box and placing it in their office in Canada seemed most cost effective. But once they added up the cost to manage the setup, it was clear that total cost of ownership would be not be very cost effective. This customer is now looking at hosting their application in Azure – hoping to take advantage of Geo-Location guarantees it offers. The key point to take away from this example is this – by taking advantage of the global data centers, even small to medium businesses can achieve a world-wide reach.
One final point about pricing, it is ultimately the free market model that will determine where the PaaS pricing lands. As is evident from the major mid-course correction regarding the SQL Data Services offering, Microsoft is going to have to listen to and adjust, based on customer and market demands. I also expect that Microsoft would offer special promotions initially(for MSDN users, small businesses and startups – similar to the BizSpark program) to build the momentum.
So, do the cost estimates for Azure put it out of reach of most small/medium businesses, I will let you be the judge.
Concern #4: “I am out if there is no Remote Desktop Access”
This is probably the most often requested feature. It is easy to see how remote desktop access would be helpful; it allow access to the desktop, install software components, custom configuration – registry tweaks etc. While remote desktop may very well be supported in the future, keep in mind that the tradeoff will be that customers would be required to assume the responsibility for additional administrative tasks including patch management and access control.
Once again the underlying fabric may already support this capability. During his PDC 08 Manuvir Das talked about an “escape hatch” or raw mode as something that is available under the covers.
Screen clipping taken from http://channel9.msdn.com/pdc2008/ES16/ Fast forward to the 21.44 minute mark in the presentation.
Concern #5: “You cannot seamlessly move your Azure application back to the datacenter”
The vendor lock-in argument against Azure is that once you build an application for Azure, it will be hard to move it back to an on-premise and hosted data center. Frankly, vendor lock-in is a concern with any cloud provider. As I heard David Chappell say recently – “there is no lock-in like the cloud. Cloud provider can turn the spigot of the service off, rather abruptly” ( I have paraphrased his quote a bit as I cannot recall his exact words).
Clearly, trust in cloud computing can only build over time. And hopefully over time, through good experiences, concerns over vendor lock-in will diminish as well.
If we take a closer look at what it takes to build an Azure hosted application, the lock-in concerns do not seem as grave.
For instance, it is encouraging to see that Microsoft along with IBM, Rackspace and others have agreed to join the Simple API  project by Zend Technologies that provides a common File Storage, Document Storage and Simple Queues API against a cross-section of vendor offerings including Rackspace Cloud Files, Windows Azure Storage, Amazon S3 and Nirvanix.
The Windows Azure team has also doubled-down on the more open RESTful web API, thus making it easier for non .NET applications to consume it. The Azure team has also recently unveiled the native mode execution capability that lifts the restriction on running applications that are based on managed .NET code only. This means that it is now possible to host application built using Python/ CGI on Azure.
As I have stated earlier, Azure development is squarely based on .NET building blocks WCF/WF/ASPX and so on. So fundamentally you are writing .NET code – it is important to understand that the only new API introduced with Azure – called the Service Runtime API – fits in one screen (shown below). So there is not going to be a large body of Azure specific syntax that needs to be ported, should you decide to move the application back to your premises.
While the code you write for an Azure application is all .NET based, there are a number of guidelines  that one will need to adhere to, for a successful implementation on Azure.
· Develop for the sandbox
· Prefer scale out over scale up
· Separate out the state from UI ; store the state in a distributed horizontal storage
· Be loosely coupled
· Be prepared to handle varying loads
· Deal with failures; build retry logic that is idempotent
· Rolling upgrades – upgrade your application without any downtime
· Rely on unified logging ; build an alert mechanism
Fortunately most of the above bullets are the guidelines that the industry as a whole has been chasing over the last decade. So to the extent we can inculcate these in our design, our applications will better off – whether we choose to host them in the cloud or not. I would even go to the extent of saying that cloud computing done right is a great hope for all of the industry.
Concern #6: “Azure is slow compared to EC2 or GAE”
How would a performance of a Web application hosted inside Azure compare with a similar application developed using EC2 or GAE? First off, this is an apples-to-oranges comparison. With EC2 you can ask for a specific hardware configuration (for example, you have the option to select an EC2 instance that has 8 virtual cores with 2.5 EC2 compute units each ). Azure offers full relational capability that EC2 and GAE don’t. Azure offers a batch processing capability that GAE does not. I could go on and on but you get the idea. These are three different approaches to cloud computing and offer distinct performance optimization strategies.
In closing, let me say this – I would be pollyannaish to suggest that Azure is a perfect cloud offering. Only time will tell how successful Azure is. After building a handful of Azure applications ( mainly POCs), I, like many others, have run into some challenges with the current CTP – Here are a few examples of the kinds of problems I am talking about – Provisioning ( I am not talking about new applications, even upgrading applications where the model is unchanged ) is very slow. The slowness in provisioning further exacerbates the fact that the provisioning is very coarse grained – even if you had to change a single master page, code behind or CSS file, one has to go through the slow process of upgrading my application. Another example would be the pricing of .NET Service bus – based on the information available so far, it is very hard to develop a pricing model for a complex app – how do you differentiate between TCP connection vs. streaming connection, what does a transaction mean exactly? The diagnostics support is rudimentary – all you have are the logs (no access to the event logs, server logs etc.). Furthermore, it takes several minutes to copy them to the blob storage. Finally, there is only support for two types of roles – Web and Worker. What if I wanted to add on specific software (for example Excel Services)? Or, what if I wanted to run .NET 2.0 or even a .NET 4.0 based applications? (Note that a number of enhancements to the logging functionality have been announced very recently  including ability to look at the performance counters and auto copying of logs on a periodic basis)
The Azure team has heard the above feedback many times and has their work cut out for them. They are working hard to alleviate some of aforementioned challenges by the time Azure is launched commercially in November and in subsequent releases. And for the rest of us, who want to build cloud application using the .NET building blocks we know and the VS.NET based tools we love, work is cut out for us as well – brainstorm about the applications that can leverage the Azure platform, build prototypes, provide feedback on the pricing models, develop tools & utilities and last but not the least, demand the best product possible.
 Towards a Unified Ontology of Cloud Computing – http://www.cs.ucsb.edu/~lyouseff/CCOntology/CloudOntology.pdf
 Above the Clouds: A Berkeley View of Cloud Computing http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
 Dynamic Scaling – http://channel9.msdn.com/pdc2008/ES19/
 Simple API – http://www.simplecloud.org/
 Securing Microsoft’s Cloud Infrastructure – http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf
 Business Productivity Online Suite – http://www.microsoft.com/resources/Technet/en-us/MSOnline/bpos/html/99d9ede5-ce15-476c-9a3f-d42a481d287e.htm
 Azure team recently announced the following enhancements to Windows Azure Logging – http://blogs.msdn.com/windowsazure/archive/2009/10/03/upcoming-changes-to-windows-azure-logging.aspx
 Amazon EC2 Instance Types – http://aws.amazon.com/ec2/instance-types/