Monte Carlo Simulation on Azure
May 1, 2010
Live Demo http://ais.cloudapp.net **
** To limit the cost of hosting this demo application, the availability is limited to regular business hours - 9:00 am to 5:00 pm EST. An on-premise utility, based on Windows Azure Service Management cmdlets, is used automate the creation and removal of this application.
[ Readers who are not already familiar with Windows Azure concepts may find it useful to review this first ]
This project was motivated by an article by Christian Stitch: In his article, Christian Stitch describes an approach for financial option valuation implemented with Monte Carlo simulation using Excel Services. Of course, with Windows Azure, we have now easy access to highly elastic computational capability. This prompted me to take Christian’s idea and refactor the code to run on the Windows Azure Platform.
Monte Carlo Simulations
You can read more about Monte Carlo Simulation on the Wikipedia page here. But here is an explanation from Christian’s article that I found succinct and useful:
“Monte Carlo simulations are extremely useful in those cases where no closed form solutions to a problem exist. In some of those cases approximations to the solution exist; however, often the approximations do not have sufficient accuracy over the entire range. The power of Monte Carlo simulations becomes obvious when the underlying distributions do not have an analytical solution. Monte Carlo simulations are extensively used in the physical and life sciences, engineering, financial modeling and many other areas.”
It is also important to note that there is no single Monte Carlo method of algorithm. For this project I follow these steps:
Why use the Windows Azure Platform?
Monte Carlo simulation results require a very large number of iterations to get the desired accuracy. As a result, access to elastic, cheap computational resources is a key requirement. This is where the Windows Azure Platform comes in. It is possible to dynamically add compute instances as needed (as you would imagine, you only pay for what you use). Furthermore, it is also possible to select from a set of small (2 cores), medium (4 cores) and large (6 cores) compute instances.
As part of my testing, I ran a simulation involving a billion iterations. In an unscientific test, running this simulation using 2 small compute instances took more than four hours. I was able to run the same simulation in minutes ( < 20 minutes ) by provisioning four large compute instances.
In addition, to the elastic computation resources, Windows Azure also offers a variety of persistence options including Queues, Blobs and Tables that can be used to store any amount of data for any length of time.
Last but not the least, as a .NET/ C# developer, I was able to port the existing C# code with multi-threading, ASP.NET and other constructs, easily to Windows Azure.
Let us take a brief look at the architecture of this application. I will follow Philip Krutchen’s “4+1” view model to describe the architecture of this application. Philip Krutchen’s approach uses different viewpoints such as logical, development, process and physical view to describe the system.
There is a single use case for this application. Users can submit a Monte Carlo simulation request by providing domain inputs such as Mean, StdDev, Distribution, MinVal and MaxVals. Users also have to specify the number of iterations. Once the request has been successfully submitted, a unique task identifier is assigned. Users can then monitor the completion status of their tasks by clicking on the Monitor tab. Upon completion of the task, users can analyze the calculation results using two charts that depict the density curve based on the results of the calculation.
Overall, the system follows a simple, yet powerful, asynchronous pattern wherein the Web role(s) place a request for calculations to an Azure Queue. A set of Worker roles then retrieve these requests, perform the necessary calculations and store the results in the Azure table storage.
Worker and Web roles are stateless and completely decoupled. In fact, Web and Worker roles are packaged as two distinct Azure services. As a result, it is possible to scale this application up and down as needed. For instance, if a large number of calculation requests were to come in at the same time, it is possible to add Web and Worker roles in real time. At the same time, it is also possible to completely tear down the Worker roles during periods of inactivity (thereby incurring no Windows Azure hosting charge)
As indicated earlier, the code is broken up into two Azure services.
As the name suggests, this service is responsible for the UI. It is based on a MVC Web Role. This service accepts the calculation task details from the user, chops it up into smaller subsets (referred to as jobs) and writes them to the queue. There is one message for each subset (interestingly, the worker roles, in turn, further subdivides this subset based on the number of VM cores ).
The Submit and Monitor tabs (described in the functional view) are built using straightforward MVC code. The “Analyze” tab has some rich charts built using Silverlight Control Toolkit. The two charts depict the density curve based on the results of calculation (cumulative normal and inverse cumulative respectively).
The Silverlight application uses a WCF service hosted within the web role to retrieve the calculation results from the Azure Table storage. The WCF service acts as an intermediary since the Silverlight application cannot make a cross-domain call to the Azure Table storage directly ( it is possible to access the Azure Blob storage directly by placing a ClientAccessPolicy.xml in the root container).
Again, as the name suggests, this service is responsible for performing the calculations. Each worker role periodically checks for any new requests. Once it retrieves a new request, it distributes the calculation across a number of threads ( number of threads equals the number of available cores within a VM). Once the calculation is complete, each worker marks the appropriate job as complete and stores the results of the calculations in a Azure table.
Azure Queue has semantics to ensure that every message in the queue will have the chance to be processed to completion at least once. So if the worker role instance crashes after it dequeues a message and before it completes the calculation, the request will re-appear in the queue after the VisibilityTimeout. This allows another worker role to come along and process the request to completion.
As stated earlier, each calculation task is chopped up into a bunch of jobs. Details about the calculation are stored in a single Azure table. The combination of TaskId and JobId serve as the partition and row key. The following snapshot ( created using Cerebrata’s excellent Cloud Storage Studio tool) depicts the remaining elements of the schema.
The results of the calculation are stored in a separate Azure table . The following snapshot depicts the schema for that table where results are stored
Windows Azure Platform is a good fit for applications such as the Monte Carlo method that require elastic computational and storage resources. Azure Queue provides a simple scheme for implementing the asynchronous pattern. Finally, moving existing C# code – calculation algorithm, multithreading and other constructs to Azure is straightforward.