This blog post is about the evolution of Microsoft’s application server – a collection of products including IIS and .NET framework libraries such as WCF and WF. Over the last few releases Microsoft has added 1) runtime container and activation service, via WAS; 2) multi-protocol support that goes beyond Http, via IIS 7.0; and 3) manageability extensions and caching, via AppFabric. Some of these enhancements overlap with the capabilities offered by Microsoft’s flagship integration server, BizTalk. BizTalk is well known for its robust hosting and integration capabilities. This overlap has stirred conversation in blogs and news stories over the last several months.

My goal in this post is to step back and look into some of the reasons causing application and integration servers to grow closer; we will do a side-by-side comparison of a scenario that is right in the middle of the overlap which I alluded to earlier. Additionally, as pure conjecture on my part, we will take a look at an approach to making them live together closely.

Application and Integration Server

Application servers are well known for the capabilities that enable hosting of application logic. As you would expect, they offer capabilities such as robust hosting – auto start, load balancing, fault tolerance etc. Integration servers are well known for enabling communication with external systems and, as you would expect, offer support for various message exchange patterns, guaranteed delivery, and family of adapters to connect to desperate systems and so on.

Note -as stated earlier, BizTalk is more than a typical integration server. The diagram below is intended to be more generic and depicts typical workloads of application and integration servers.

image

 

With emphasis on composite applications, application designers are increasingly looking for ways to combine the best of the capabilities offered by application and integration server. For example, they want to deploy a business service (exposed as a SOAP endpoint) and at the same time, expose it as a service endpoint that is consumable as an external system via MSMQ. Conversely, they want to expose integration endpoints that have low latency requirements and don’t necessarily need features such as guaranteed delivery and business activity monitoring.

Another factor that is driving application and integration servers closer is the move towards the cloud. Traditional on-premise based application logic is being refactored to take advantage of cloud based services such as storage, compute and access services. This is also placing a demand for integration capabilities to be added to application servers.

Not surprisingly, Microsoft’s recent announcements suggest that the line between AppFabric and BizTalk is being blurred. For instance, the BizTalk team has stated publicly that they are looking to build the next generation of BizTalk on top of AppFabric – http://www.microsoftpdc.com/2009/SVR15.

Current Dilemma

If you are architecting an enterprise application today, should you just move all your integration logic to AppFabric? There are a number of reasons why you would be tempted to do this, including 1) AppFabric is part of the OS; 2) most integration is now SOA/ WS based anyways, thus reducing the need for custom adapters; 3) operational setup and maintenance cost of running an AppFabric WebFarm is likely lower than a BizTalk farm; and 4) development of components is more approachable to a .NET developer who does not have prior knowledge of BizTalk. However, if you go down in this direction, be aware that there will be some features that will not be available. To better understand what is available and what is not available, let us use a scenario that is well supported by both – hosting a WCF endpoint. The following table compares the hosting a WCF endpoint in BizTalk and AppFabric.

 

WCF hosted Inside BizTalk

WCF hosted inside App Fabric

Binding flexibility – support for custom WCF

configuration

Binding flexibility – support for custom WCF configuration

Load balanced, fault tolerant setup

· Central administration covers all aspects. For example, a change to the WCF binding needs to be applied to a single location (port setting dialog).

Load balanced, fault tolerant setup

· Central administration for tracking purposes only. A change to a WCF binding needs to be replicated to all nodes using tools such as msdeploy.

http://download.microsoft.com/download/3/1/C/31CED722-2E5F-48D6-96B1-E73AAFD9873F/AppFabricWebFarm.docx

Common tracking database

Common tracking database

Tooling to review tracked instances

· Tooling is somewhat dated –BAM portal.

Tooling to review tracked instances

· More modern tooling.

Cross – cutting concerns – exception handling, BAM, pre and post processing (formatting etc.)

· Advanced support in the form of suspended message queue, BAM portal.

Cross – cutting concerns – exception handling, BAM, pre and post processing

(Formatting etc.)

· Not as advanced as BTS yet.

Guaranteed delivery

No guaranteed delivery (except when provided by the transport such as msmq)

Better suited for async message exchange. Request /Response for low latency scenarios are not easily handled.

Better support for different message exchange patterns – Request/Response, Asynchronous.

Extensibility points for transforming incoming and outgoing messages

· Pipeline components (, EDI, AS2, MIME, etc.)

· Mapper

· Flat File Parser

Extensibility points for transforming incoming and outgoing messages as defined by WCF

· Custom Channels

Business logic is baked into a set of code and configuration components spread across the system

· Custom pipeline component

· Receive and send port definition

· Custom mapper

· Orchestration

This setup allows flexibility in handling changes. For example, a new message decoder can be added for incoming messages, by simply adding a custom decoder step in the receive pipeline.

Business logic is encapsulated into the body of method (of course, factored appropriately into a class libraries etc.).

This setup is makes it easy to single step through the code. But this also means that every aspect of the logic manifests itself as code. In essence, you need to roll out a new version of binary to make a change.

Large message handling

· Every message in BizTalk by default is routed via the message box. This makes it a challenge to handle large messages. However, there are well known workarounds for this. One of them is described in detail by Paolo

http://blogs.msdn.com/b/paolos/archive/2010/05/25/large-message-transfer-with-wcf-adapters-part-1.aspx.

Large message handling

Type safety

· WCF adapters use un-typed contracts for receiving messages. This allows it to receive any type of message.

Type safety

· Contracts can be typed and un-typed

No built-in caching

Caching

Secure Storage Service(SSS) to secure store credentials

No equivalent capability

What can we expect next?

The latest release of BizTalk 2010 is already bridging the gap between the two products. For instance, AppFabric connect feature of BizTalk 2010 offers ways to leverage BizTalk capabilities inside AppFabric hosted programs. Specifically:

1) Ability to leverage the BizTalk mapping capability inside a WF program using the Mapper activity.

2) Ability to generate WF activities that invoke operations on WCF LOB adapters provided by BizTalk.

For more information see – http://blogs.msdn.com/b/biztalk_server_team_blog/archive/2010/06/09/biztalk-appfabric-an-introduction.aspx

I expect this trend to continue, in fact, I envision that a future version of App Fabric would look something like the diagram below (again, this is pure conjecture on my part). As you can see, the BizTalk provided capabilities will be moved into AppFabric (shown in red). This will allow WCF endpoints to take advantage of guaranteed delivery, as well as, the configuration flexibility offered by BizTalk ports (pipeline components, mapper etc.).

 

image

 

 

Call to Action

It is clear that AppFabric and BizTalk are moving closer together. In future, we can expect to have access to capabilities offered by both, rather than choosing between them.

Messaging, with its defining characteristics including adapters, reliable communication and mediation, is generally accepted as the preferred way to connect disparate systems (look no further than Gregor Hohpe’s seminal book on Enterprise Integration Patterns for more on messaging). As we have seen in this post, BizTalk has built-in support for messaging. This is why it is recommended that BizTalk be considered for most integration scenarios

AppFabric should be considered for hosting most of the application logic. Microsoft is making strategic investment in AppFabric to not only to bring advanced management and robust hosting capabilities but also as a way to seamlessly enable application migration between on-premise and the cloud based deployments (commonly referred to as Server / Service symmetry).

you are about to embark on a distributed application development project and are wondering how you can leverage the Windows Azure Platform. Should  you invest in private cloud, move your application to the public cloud, choose a hydrid approach or keep the application on-premise?   In this blog post, I will try to briefly describe each model and provide some commentary on the impact each of the aforementioned model can have on development and operations.

Before we begin, let us sketch a fairly common  multi-tier application architecture. We will use it through the course of this post.  Along the name of each layer, I have appended, in parenthesis, a few implementation choices. For instance the data layer may be implemented using SQL, Analysis Service, and Master Data Services (MDS).

 

image 

Private Cloud

The term private cloud has been used to describe two very different models. The first is the ability to run a variant of Windows Azure in a customer’s datacenter. While at least one Microsoft executive has hinted at this possibility, it is highly unlikely that we will see this anytime soon.

The second model refers to the techniques like virtualization, automated management, and utility-billing models, within their own data centers. Microsoft has been increasingly talking about toolkits such Dynamic Datacenter Toolkit  that will allow IT managers to implement the aforementioned cloud computing concepts, in their data centers. This is different from running a cloud OS on-premise. 

In addition to the two private cloud models described above, there is also the notion of Virtual Private Cloud wherein a cloud provider will segregate a bunch of machines in its data center and dedicate them for a given customer. While vendors such as Amazon have talked about this option quite a bit, Microsoft has not announced anything significant in this area.

Impact on Development and Operations

1) Development and operational impact will depend on the degree to which an organization commits to building a private cloud. If the private cloud is primarily designed to offer Infrastructure as a Service (IaaS), the impact on development is minimal. In other words, the development effort is not very different from a traditional on-premise development.

2) Setting up a private cloud requires significant investment and careful planning and is therefore only recommended for very large enterprises with special security needs.

 

Public Cloud

Public cloud refers to the approach wherein the entire application is hosted on the Windows Azure platform. For our candidate architecture it would mean that all layers of architecture are in the cloud. While this is the most cost effective model, there are a number of limitations associated with it. The ability to install custom software is limited unless one can use xcopy to deploy third-party applications. All of the Azure compute instances  run the same base image (64 bit Windows Server, .NET 3.5 SP1). Microsoft has announced that future versions of the Azure platform will allow customers to create their own base images. Another limitation is the SQL Server functionality that is available as part of SQL Azure. For instance, MDS is not available with SQL Azure today. Additionally, SQL Azure databases are limited to a  maximum size of 50 GB.

 

image

Impact on Development and Operations

1) The design should adhere to a patterns including multi-tenancy, statelessness, and ability to dynamically handle the changes to the configuration (adding web front end nodes with increase in load).

2) Limits imposed by the Windows Azure Platform must be taken into account. For example, the data layer will need to utilize some sort of sharding  scheme to get around SQL Azure max size limits.

3) Public cloud option offers other significant advantages such as  automated service management, fault detection and notification that can potentially reduce the operational cost.

4) Upfront capital expenditure is replaced with ongoing operational expenditure based on the resource used.

Hybrid Cloud

Hybrid cloud refers to a style of cloud computing that combines functionality available in the cloud with the resources based on on-premise. Such an arrangement could be motivated by special security requirement (such as the Payment Card Industry security standards)for data.   For our proposed architecture, one scenario would be to have the UI and business services layer hosted in the cloud, while the data layer resides on-premise. Technologies such as Azure AppFabric ServiceBus and Project Sydney can be used to bridge the connection between cloud and on-premise datacenter.

image

 

 

Impact on Development and Operations

1)The design should adhere to a patterns including multi-tenancy, statelessness, and the ability to  handle changes to the configuration dynamically(i.e., ability to add additional web front ends to handle higher load).

2) Application design will need to compensate for the latency as a result of the data layer being remote. For instance, a caching layer may need to be introduced.

3) The operations team will need to plan, review, and setup adequate trust between the Azure and the on-premise data center.

4) A scalable / robust connected infrastructure between Windows Azure and the on-premise data center needs to be available.

5) IT will need to evaluate the impact on security, compliance, and availability.

Cloud Ready Design

Perhaps you are not ready for any of the aforementioned cloud options. In this instance, you can consider designing your application in a manner that will allow you to leverage the cloud in the future.  The “server-services" symmetry depicted in the diagram below exemplifies this approach. It illustrates how Microsoft plans to align the building blocks across its server OS (Windows Server 2008 and beyond) with the “Azure OS”. A good example of the overlap between server and services is AppFabric Caching -  According to Gopal Kavivaya[1], the underlying architecture of velocity is based on data fabric technology that powers SQL Azure.

clip_image014

 

Impact on Development and Operations

1) Utilize existing management tools such as System Center that are being refactored to work with the on-premise, as well as, Azure based applications.

2) Understand and leverage the App Fabric as new services become available.

3) Be cognizant of the differences between the SQL Azure and SQL Server running on-premise and avoid large monolithic database instances, as well as, features such as CLR and Service Broker that may not be immediately available in SQL Azure.

4) Avoid relying on special security privileges, file system or registry access etc. that will not be available when executing on Windows Azure.

[1] http://channel9.msdn.com/pdc2008/BB03/

 

Live Demo  http://ais.cloudapp.net  **

** To limit  the cost of hosting this demo application, the availability is limited to  regular business hours -  9:00 am to 5:00 pm EST. An on-premise  utility, based on Windows Azure Service Management cmdlets, is used automate the creation and removal  of this application.

 [ Readers who are not already familiar with Windows Azure concepts may find it useful to review this first ]

This project was motivated by an article by Christian Stitch:   In his article, Christian Stitch describes an approach for financial option valuation implemented with Monte Carlo simulation using Excel Services. Of course, with Windows Azure, we have now  easy access to  highly elastic computational capability. This prompted me to take  Christian’s idea and refactor the code to run on the Windows Azure Platform.

Monte Carlo Simulations

You can read more about Monte Carlo Simulation on the Wikipedia page here.  But here is an explanation from Christian’s article that I found succinct and useful: 

“Monte Carlo simulations are extremely useful in those cases where no closed form solutions to a problem exist. In some of those cases approximations to the solution exist; however, often the approximations do not have sufficient accuracy over the entire range. The power of Monte Carlo simulations becomes obvious when the underlying distributions do not have an analytical solution. Monte Carlo simulations are extensively used in the physical and life sciences, engineering, financial modeling and many other areas.”

It is also important to note that there is no single Monte Carlo method of algorithm.  For this project I follow these steps:

  • Ask the user to define a domain of possible inputs (Mean, StdDev, Distribution, MinVal and MaxVal).
  • Generate inputs using  Box-Muller transform.
  • Perform a deterministic computation using the inputs for number of iteration requested.
  • Aggregate the results of the individual computations into the final result.

    Why use the Windows Azure Platform?

    Monte Carlo simulation results require a very large number of iterations to get the desired accuracy. As a result, access to elastic, cheap computational resources is a key requirement. This is where the Windows Azure Platform comes in. It is possible to dynamically add compute instances as needed (as you would imagine, you only pay for what you use). Furthermore, it is also possible to select from a set of small (2 cores), medium (4 cores) and large (6 cores) compute instances.

    As part of my testing, I ran a simulation involving a billion iterations. In an unscientific test, running this simulation using 2 small compute instances took more than four hours.  I was able to run the same simulation in minutes ( < 20 minutes ) by provisioning four large compute instances. 

    In addition, to the elastic computation resources, Windows Azure also offers a variety of persistence options including Queues, Blobs and Tables that can be used to store any amount of data  for any length of time.

    Last but not the least, as a .NET/ C# developer, I was able to port the existing C# code with multi-threading, ASP.NET and other constructs, easily to Windows Azure. 

    Architecture

    Let us take a brief look at the architecture of this application. I will follow  Philip Krutchen’s “4+1” view model to describe the architecture of this application. Philip Krutchen’s approach uses different viewpoints such as logical, development, process and physical view to describe the system.

    Functional View

    There is  a single use case for this application. Users can submit a Monte Carlo simulation request by providing domain inputs such as Mean, StdDev, Distribution, MinVal and MaxVals. Users also have to specify the number of iterations. Once the request has been successfully submitted, a unique task identifier is assigned. Users can then monitor the completion status of their tasks by clicking on the Monitor tab. Upon completion of the task, users can  analyze the calculation results using  two charts that depict the density curve based on the results of the calculation.

    clip_image002clip_image002[4]

  • Logical View

    Overall, the system follows a  simple, yet powerful, asynchronous pattern  wherein the  Web role(s) place a request for calculations to an Azure Queue. A set of Worker roles then retrieve these requests, perform the necessary calculations and store the results in the Azure table storage.  image

    Worker and Web roles are stateless and completely decoupled. In fact, Web and Worker roles are packaged as two distinct Azure services. As a result, it is possible to scale this application up and down as needed. For instance,  if a large number of calculation requests were to come in at the same time, it is  possible to add  Web and Worker roles in real time. At the same time, it is  also possible to completely tear down the Worker roles during periods of inactivity (thereby incurring no Windows Azure hosting charge)

    Development View

    As indicated earlier, the code is broken up into two Azure services.

    UI Service

    As the name suggests, this service is responsible for the UI.  It is based on  a MVC Web Role. This service  accepts the calculation task details  from the user, chops it up into smaller subsets (referred to as jobs) and  writes them to the queue. There is one message for each subset (interestingly, the worker roles, in turn, further subdivides this subset based on the number of VM cores ).  

    The  Submit and Monitor tabs (described in the functional view) are built using  straightforward MVC code. The “Analyze” tab has some rich charts built using Silverlight Control Toolkit. The two charts depict the density curve based on the results of calculation (cumulative  normal and inverse cumulative respectively).

    The Silverlight application uses a WCF service hosted within the web role to  retrieve the calculation results from the Azure Table storage. The WCF service acts as an intermediary since the Silverlight application cannot make a cross-domain call to the Azure Table storage directly ( it is possible to access the Azure Blob storage directly by placing a ClientAccessPolicy.xml in the root container).

    Calc Service

    Again, as the name suggests, this service is responsible for performing the calculations. Each worker role periodically checks for any new requests. Once it retrieves a new request, it distributes the calculation across a number of threads ( number of threads equals the number of available cores within a VM). Once the calculation is complete, each worker marks the appropriate job as complete and stores the results of the calculations  in a Azure table.

    Azure Queue has semantics to ensure that every message in the queue will have the chance to be processed to completion at least once.  So if the worker role instance crashes after it dequeues a message and before it completes the calculation, the request will re-appear in the queue after the VisibilityTimeout. This allows another worker role to come along and process the request to completion.

  •  Physical View

    As stated earlier, each calculation task is chopped up into a bunch of jobs. Details about the  calculation are stored in a single  Azure table. The combination of TaskId and JobId   serve as the partition and row key. The following snapshot ( created using Cerebrata’s excellent Cloud Storage Studio tool) depicts the remaining elements of the schema. clip_image002[10]

    The results of the calculation are stored in a separate Azure table . The following snapshot depicts the schema for that table where results are stored

    clip_image002[12]

     

     Summary

    Windows Azure Platform is a good fit for applications such as the Monte Carlo method that require elastic computational and storage resources.  Azure Queue provides a simple scheme for implementing the asynchronous pattern. Finally, moving existing C# code – calculation algorithm, multithreading and other constructs to Azure is straightforward.

  • Higher Order Software Development

    I cannot claim it to be an act of volition, but, over the last three years I have found myself involved with building software designed to allow the business users to create the programs they need. I now have a somewhat presumptuous name for this approach – Higher Order Software Development. This approach is similar to the composite application site wherein applications are assembled using existing software assets. The difference, however, is that composite applications such as mashups are about assembling assets that have been built independently. Higher order software development is about software building blocks, designed from the ground-up, to allow business users to “develop”. In the last few years, many approaches including BPM (business process management) have espoused empowering the end-users. Even though BPM systems have been successful in improving the process agility, they have fallen short of making software development end-user ready. This is mainly because the BPM tools are designed to address a broad set of application scenarios, and in most instances, BPM offerings represent a collection of tools (workflow designer, rules engines, modeling, and so on). This has inadvertently raised the level of complexity, thus making it harder for end-users to participate. On the other hand, higher order software development has a much narrower focus — it is about solving a specific business problem within a given business domain. This approach has been made possible because of the rise in the level of abstractions available within the development platform itself. As we will discuss shortly, platforms such as SharePoint have served as a catalyst for higher order software development. 

    Why is this important?  First, business users know the requirements the best. Unfortunately, most business users are not very good at communicating them – hence the famous phrase “I will not know what I want until I see it working”. Furthermore, a portion of the requirements are invariably lost in translation, as they are conveyed to development teams. Second, rather than learn new interfaces, business users want to continue to work with the tools they are already familiar with. What could be better than enabling the business users to create programs using familiar tools like Microsoft Office? Third, IT departments, already backed up in supporting existing operational systems, seldom have the resources to take on new application development projects.  Thus, empowering the business users to build their own software may be the only real way to scale and enable the business to adapt/react quickly to external market forces.  Since business users are directly involved in building, the cost and scope of development can be more efficiently managed. Business users are in a much better position to respond to questions such as – “Do we really need this feature? Is the customization really needed, or will the out-of-the box functionality do?”  Last, but not least, unlocking the true potential of IT and effectively competing in the next generation of business cycles will require a greater ability to adjust and change software systems cost effectively. By evolving how programs are put together, businesses can implement a process of continuous improvement over time.

    For the remaining portion of this post, I will describe four concrete examples of applications to which we have applied this paradigm.

    Example #1: Custom Calculation Engine in the Professional Services Industry

    A common requirement in this domain is to implement calculations/reports that adhere to the Financial Accounting Standards Board (FASB) standards. These types of reports are often large and complex. The calculations in the reports are specific to a geographical region, so a multi-national company needs to implement different versions of these calculations.  Furthermore, over time these calculations have to be adjusted to comply with changing laws. Traditionally these calculations have been implemented using custom code, and, as a result, suffer from the challenges outlined above, including: the high cost of development and maintenance, requirements being lost in translation, the lack of traceability, and the lack of a robust mechanism for making a quick change to a calculation in response to  a change in a standard. In a nutshell, the analysts needed a flexible calculation engine tool that makes it easy to develop and maintain the calculations, and at the same time, provides the necessary robustness and scalability.

    Here is a quick overview of the Excel Services based solution we developed. For an introduction to Excel Services please refer to the MSDN Magazine article – http://msdn.microsoft.com/en-us/magazine/cc163374.aspx. We used XSDs to capture all the input and output data elements needed for implementing a given calculation.  Using a custom Excel pre-compiler, we translated the XSD into named ranges. For example, the generated template workbook has three sheets – one each for input, output, and calculation. Analysts could then use the generated template workbooks to develop the algorithms.  As long as they worked within the contract – as defined in input, output and calculation sheets – they could use any Excel functions. Once the workbook was developed and tested, they placed the workbook into an Excel Services trusted folder for execution. To support high scalability, we used a cluster of Excel Services nodes. The following figure shows the basic architecture:

    clip_image001[4]

    clip_image003[4]

    Who built what

    Development Team

    Built a framework based Excel Services

    Business Users

    Developed the calculation logic using Excel

    As you can see from above example, we built the software building blocks using Excel Services, which in turn enabled the analysts to implement the calculations as needed. 

    Example #2: Incident Management System for Law Enforcement Agencies

    Law enforcement agencies are looking for ways to improve the processes surrounding incident reporting and management. Managing an incident entails gathering a variety of information to aid in the investigation, including documentation about the case (date, location, type of incident, summaries from eye witnesses, photographs, and evidence), as well as information about contacts and sources who can provide more information. In addition to providing an intuitive repository for this information, the system should enable an agency-wide collaborative work amongst the law enforcement personnel assigned to work on the cases. This requires the ability to assign and track tasks, initiate process steps such as review and approval, provide access to a group calendar and messaging, record the status of the case (open, active, closed, and archived), and generate centralized reports (such as the status of all cases or assignments). Information about cases also needs to be searchable and shareable with extra-agency organizations as necessary.

    In addition to the high-level functional requirements, incident management systems need to be highly secure and be able to protect confidential information. This typically implies a number of requirements to control access to information based on the role, organization, and level of data sensitivity. There are also typically requirements for these types of systems to support auditing and policy compliance monitoring, tracking of email correspondence, support of e-discovery requests, and enforcement of agency governance rules.

    The above requirements highlight the need to store large and variable amount of information about the case in one “container.”  The different document types that make up the case can have unique metadata and lifecycles associated with them. Law enforcement personnel working on the case need the flexibility to add notes to cases, include ad hoc documents, and store different versions of a document. They also need to be able to initiate a number of workflows such as approval and disposition for individual documents or a group of documents.  These workflows can be pre-defined or dynamic in nature. 

    We decided to build this system on the SharePoint platform. We provisioned the appropriate container (site collection, site, document library) for a case. The provisioned container is based on a pre-defined blueprint we built. The blueprint includes document types, folder structure, web parts and workflows. Users are then able to customize the provisioned container as needed, including changing the folder structure, adding new ad hoc workflows, and installing additional web parts for reports and data visualization. A nice side-benefit of using a SharePoint based container is the built-in support for archival and restoration of cases.

    clip_image005[4]

    Snapshot of SharePoint based Incident Reporting System

    Who built what

    Development Team

    Built SharePoint artifacts including site definitions, web parts, VS.NET based workflows

    Business Users

    Provisioned the site, customized the site, content types, adhoc workflows, custom lists and views

    As you can see from above example, we built the software building blocks using the SharePoint platform that in turn enabled the law enforcement officials to effectively manage the incidents.

    Example #3: Data Analysis Tool for the Telecommunication Industry

    Network optimization engineers often need to convert vast amount of raw network performance data into useful data visualizations and data products that can assist them in identifying network performance bottlenecks. The goal is to allow non-programmers (in this instance, network engineers) to visually define complex multi-step algorithms  that specify the steps and control dependencies for analysis of raw data.  

    The key requirement is that of flexibility and ease of use needed by network engineer to define custom data processing workflow steps. A solution that is completely built by IT would be too expensive and at the same time not be able to accord the flexibility needed in this instance.

    We decided to build the solution using Windows Workflow Foundation (WF). The WF Designer tools provides the network engineers with useful and flexible way to author the data processing algorithms.  To make it easier for network engineers to author the workflows, we developed a set of custom domain specific activities. A lot of attention was devoted to making the workflow authoring experience as simple as possible.

    Once the workflows are defined, they are executed asynchronously and results are made available for further analysis. The following diagram depicts the user interface for authoring the workflows.

    clip_image007[4]

    Snapshot of Workflow Foundation based designer used by network engineers

    Who built what

    Development Team

    Built a framework to host WF programs, developed domain specific WF activities, customized the WF designer to make it business-user ready

    Business Users

    Developed custom workflows

    As you can see from above example, by building a set of building blocks in the form of custom workflow activities,  we were able allow network engineers to develop the programs that allowed them to analyze vast amounts of raw data.

    Example #4: Management of Policy Data in the Insurance Domain

    A common requirement in the property and casualty (P&C) Insurance industry is to store and retrieve “snapshots” of a customer policy or contract, such as an automobile policy or a homeowners policy, as it existed at any given point in time in the past.  Clearly, there are a number of ways to implement this functionality. However, a key requirement is to be able to retrieve the policy snapshot very quickly – typically a sub-second response time is expected. Given the millions of rows worth of historical data that is common for such systems, it would be hard to retrieve a specific version of the policy data dynamically. An alternative approach is to pre-generate templates for the snapshots and tradeoff disk space in favor of response time. In other words, accept the additional cost of storing the entire policy snapshot for every change to the policy.   In addition to the response time requirement, there are two other  distinct requirements related to the historical data.  The first requirement is to generate reports for verification of compliance with various state laws. As you can imagine, compliance based reports tend to vary quite a bit, based on the laws of each state (within the US). The second requirement is to allow end-users to mine the historical data for interesting patterns, for instance: Why is there a higher rate of customer attrition in a given county over others?  What types of changes are more common in a given state?

    The key aspect of this scenario is that while there is a need to support high performing historical queries, there is also a competing need for flexibility in reporting and data mining. In short, the system needs to allow self-service business intelligence (BI) for the end-users.

    Let us now take a look at the solution we decided to build:  We used SQL Server Analysis Services (SSAS) based OLAP (online analytic processing) cube as an application building block.   To store the policy data snapshots we populated a OLAP cube in real time. The OLAP model we developed is depicted in the diagram below.  The dimensions (dimensions are reference information about the data) are obvious ones including customer, geography, and time. The interesting aspect of this model are the fact tables (facts are generally the numeric aspects of the transaction data). Since we are dealing with events (change in policy address, for example) as opposed to a classic business transaction that involves numbers, we ended up creating a “factless” that captures a log of all events on a policy. Additionally, to be able to retrieve the policy as of a certain date quickly, we maintain another “factless” table that captures the snapshot a given policy following any change to it.

    :

    clip_image009[4]

    Once the OLAP cube is in place, analysts are able to perform queries using a tool such as Excel, as shown in the diagram below.

    clip_image011[4]

    Who built what

    Development Team

    Organized the data in OLAP structure, developed a scheme to populate the OLAP structure in real time

    Business Users

    Developed the queries to respond to historization related requests, built adhoc reports to meet compliance and regulatory needs

    Readers who are familiar with BI (business intelligence) systems are probably wondering how the solution described above is different from a traditional BI solution. Furthermore, why do I see this as an example of higher order solution development? While this is indeed a BI-based solution, it is a different from a traditional BI Data Warehouse / Data Mart in several important ways. First, the dimensional model, with the factless and snapshot tables, is different from a traditional BI data warehouse. In this instance, the data model has been designed in a manner to promote “assembly” by business users. Secondly, the OLAP cube is very much part of the application architecture in the sense that  queries from the UI (user interface) layer are serviced by the OLAP cube in near real-time. This also unlike a traditional BI system which is usually a distinct system designed to serve as  a decision support system. Finally, by combining a traditional relational database with the OLAP cube, we are able to offer the notion of a composeable data services layer.

    As you can see from above example, the OLAP based building blocks built by the development team enabled business users to perform dynamic queries to respond to historization requests from the client and build adhoc reports to meet compliance and regulatory needs.

    In summary, with all of the above examples, we built the software tools that allowed the business-users to create the program they need.  This approach has a number of benefits including reduced cost of development and increased agility to adjust and change software systems.

    During the last year, I have spent quite a bit of time on Cloud Computing in general and specifically on the Windows Azure Platform. So I thought, I would share with you, my humble 2010 predictions related to Cloud Computing.

    • Cloud Computing Services Revenue – Leading System Integrators could see up to 10% of their revenue coming from cloud based services, by the end of 2010. System Integrators with an App/Dev focus should invest in the building competency around cloud based application development. For instance:  invest in developing expertise on cloud application development platforms such as Azure, evaluate how their existing product/offerings can leverage the cloud and  understand the new business models enabled by the cloud.
    • Small and Medium Businesses (SMBs)– Another interesting opportunity for the SIs is to bring the benefits of the cloud to the SMB space. Unlike the common perception that  cloud computing is mainly for large, elastic workloads, I think there is a real opportunity for SMBs to leverage the benefits of the cloud. For the  first time, cloud platforms enable SMBs to take advantage of enterprise functions such as workflow, BPM, CRM etc. at a price point that is approachable to them. This is an opportunity not only for the SMBs to make their businesses more agile, but also for  SIs to build competency on cloud computing (since the work with SMBs will allow them to start small and gradually build their expertise)
    •  No Letup in the Pace of Innovation by Cloud Providers -  In the past year, there were a staggering  number of changes to the Windows Azure Platform alone (other platforms were no different). Starting with the full relational based storage, support for Java/Python, ability to choose from different type of compute instances, market place for data and more. We can expect this trend to continue in 2010, which is why, it is key for SIs to invest in constantly keeping up with these fast paced innovations. 

    Happy New Year!

      Here are my personal favorite SharePoint 2010 new features and enhancements:

       

    • Access Services & Visio Services
    • Building on the successful Excel Services pattern, SPS 2010 allows users to publish Access and Visio applications to SharePoint which other users can access through their browsers. As can be expected, Access and Visio Services will be somewhat limited in functionality(when compared to their respective desktop versions), but on the flip side, the customers would benefit from the server-side capabilities such as scalability, one centralized version of the document (rendered in a browser friendly format), access to multiple users and security control.   

      Access Services will allow customers to publish their Access 2010 applications to SharePoint. For example, consider a departmental application such as a travel planner built using Access. Using Access Services it will be possible to bring such an application to Sharepoint, thereby reducing the total cost of ownership through one common user experience, development and deployment strategy.

      Similarly, Visio Services will allow customers to render Visio diagrams within the browser. An application of Visio Services would be a visualization of a SharePoint workflow that is rendered inside a browser as a Visio flowchart. Note that no client-side Visio software would be needed in this scenario.

     

    • Business Connectivity Services (BCS)
    • In MOSS 2007, Business Data Connector (BDC) was the tool of choice for integrating external data sources  into Sharepoint. BDC was available only to customers who purchased the enterprise version of MOSS 2007. Furthermore, BDC was limited to only reading data from the external data sources. Busincess Connectivity Services (BCS), which replaces the BDC functionality in SPS 2010, goes much further. Not only is BCS going to be available with the free, base version of SPS 2010 (renamed to Windows SharePoint Foundation from Windows SharePoint Services), it now supports updating, deletion and insertion of data into an external data source.  Another key advantage of BCS is that it will enable SharePoint Lists to be created directly based on a database table.

     

    • Developer Productivity Enhancements
    • First and foremost and unlike previous versions, it will now be possible to conduct SharePoint development on a client OS such as Windows 7. In earlier versions SharePoint development had to be undertaken on a server OS. This inevitably created overhead as the development had to be undertaken in a virtualized environment. Secondly, developer enhancements go beyond the support for client operating systems. Developers can now utilize familiar .NET patterns and practices such as LINQ (Language Integrated Query) to query data from a SharePoint List, XSLT for customizing a SharePoint list and Silverlight for creating richer visualizations. There are also a number of enhancements to the SharePoint programming model that offer greater flexibility in extending the out-of-the-box SharePoint behavior.  For example, SharePoint developers commonly extend the SharePoint behavior using small segments of code commonly referred to as event handlers. A common issue with event handlers in MOSS 2007 is that the event handler is not finished executing before the control is returned back to the user. This can lead to a confusing behavior for the users. SPS 2010 seeks to alleviate this by introducing “After-Synchronous” events which are guaranteed to complete execution before the control is returned to the user.

      Finally, Visual Studio 2010 contains extended capabilities for developers to create rich applications on the SPS 2010 platform. Many improvements have also been made in Visual Studio 2010 to increase developer productivity as well as take advantage of SPS 2010 functionality. These improvements help make it easier for .NET developers to create and deploy SharePoint solutions.

     

    • SharePoint Designer Enhancements
    • SharePoint Designer is a tool targeted towards business users and designers who have used it to quickly and easily perform actions such as customizing lists, changing layouts and creating simple workflows. The one area of improvement often requested is the ability to package and reuse the changes made using SharePoint Designer.  Fortunately, SharePoint Designer 2010 not only addresses this request, it also comes with major improvements with regard to designing workflows, editing SharePoint pages and setting up BCS ( discussed above). SharePoint administrators also have greater control over how SharePoint Designer 2010 is used within their environment. For instance, administrators can block the ability to perform certain actions such as modifying certain SharePoint pages and restricting access to certain areas with the Sharepoint setup.

      One other key feature provided by SharePoint Designer 2010 is the improved interaction between the business users and the IT department. Since SharePoint Designer 2010 uses the same packing and deployment format (commonly referred to as WSP solution package) as the rest of the SharePoint platform, it is now possible to take the work done by a business user within the SharePoint Designer 2010 and import it within Visual Studio 2010. This will allow the IT department to build upon and extend the work done by the business users. 

     

    • Business Intelligence Enhancements
    • Business Intelligence is another area with significant improvements. To begin, there are a number of scalability improvements to the desktop version of Excel 2010, including the support for the numbers of rows within a workbook that goes far beyond the 64K limit in the current version. The new version of Excel effectively utilizes enhancements in the OS such as the 64-bit support for access to large memory pools and enhancement in hardware such as the multi-core computers.

      Microsoft has made a strategic decision to leverage Excel as the primary BI analyst tool. As part of the self-service BI initiative, Excel users can import large amounts of data from various data sources and quickly generate pivot tables. Once the data is imported into Excel 2010, using a feature called “Slicers”, users will be able to easily filter the data. In other words, Excel users can extract, transform and load (ETL) data from multiple sources directly into Excel without requiring IT to build a formal data import setup. The pivot table generated from imported data can later be shared with other users by publishing it to SharePoint 2010. Under the covers, the publishing process creates an SQL Server Analysis Services instance that can be monitored by IT, using tools such as usage dashboard and resource quota.

      Excel Services, the server-side Excel functionality provided by SharePoint Server, has seen a number of improvements as well. First off, it directly benefits from the scalability improvements to the underlying calculation engine mentioned earlier. Secondly, it enables additional programmability enhancements. For example it, using a JavScript based program, it will be possible

     

    • In-Place Records Management
    • Records management relates to the process of marking a document as a record or “laminating” them for legal and compliance reasons. A records repository is a library of such records. SPS 2010 dramatically improves upon the capabilities for managing records by extending where and how the records are managed. Unlike MOSS 2007, SPS 2010 will support multiple records repositories within a single SharePoint installation.. This means that users can route records to multiple record repositories. Records managers can also define routing rules to aid in the classification of records. Records repositories now also support the use of hierarchical structures (commonly referred to as File Plan) for storing records in a manner that matches the customer’s organizational hierarchy.

      SharePoint 2010 also provides the flexibility for defining and applying retention rules. For instance, it is now possible to apply a recurring retention rule that specifies several stages of  record retention. This will be very helpful when a records manager wants to specify a retention rule such as “Maintain this document for three years after project completion then transfer it to the archive for the next five years”.

      In addition to using a records repository, SPS 2010 has the ability to declare records in-place, i.e., allow any document inside a document library to be marked as a record without the need to explicitly move it to a records repository. The new in-place record management capability exemplifies Microsoft’s mantra of “compliance everywhere”

     

    • Enhancements related to Large List Handling
    • Microsoft has placed a lot of emphasis on performance of lists with large numbers of items in them. This has resulted in the addition of the new SPS 2010 features known as : Query Throttling, Batch Query Support, Remote Blob Storage, and External Lists.

      Query Throttling allows IT administrators to limit -or “throttle” – queries that are executed against large lists from consuming too many resources. For example, SharePoint will now block queries that will return more than 5000 rows by default – and this is of course, a configurable setting.  Rather than iterating over each item, SharePoint also includes the capability to process multiple items in a list simultaneously, as a single batch operation.  If there is content that is more suitable for storage outside SharePoint (for instance large media files), the Remote Blob Storage feature will provide a mechanism to store it on file shares (SANS, NAS, RAID arrays, File Servers).

      Finally, External Lists is another mechanism by which large lists can be incorporated in SharePoint. As stated earlier, database-backed lists that are a part of BCS allow content to be stored in separate data stores – such as SQL Server – outside of the Content Database. The benefit of course is that this feature allows for providing supporting large lists without burdening the SharePoint 2010 content database.

     

    • Workflow Improvements
    • Workflow is a very useful feature of SharePoint 2007. In addition to the out-of-the box workflows that were included with SharePoint 2007, it is possible to use the SharePoint designer to build declarative (no-code) workflows. For more advanced scenarios, developers created custom workflows in Visual Studio. Two of the main challenges in building workflows with SharePoint 2007 relate to the limitations in SharePoint Designer 2007 and the workflow host that is part of SharePoint 2007. While SharePoint Designer allows business users to develop simple workflows, it requires a direct connection to the site for which the workflow is being developed, thus limiting the ability to reuse the workflow in different locations. Furthermore, there is no easy way to leverage code components developed by IT. The workflow host within SharePoint 2007 comes pre-configured with a set of services such as persistence, eventing and tracing. There is no way to add custom services to the runtime host. For instance, there is no direct way to allow SharePoint-based workflows to interact with events other than the list item events.

      SharePoint 2010 alleviates these challenges by allowing the workflows created using SharePoint Designer to be saved as templates for reuse. SharePoint Designer 2010 has also been enhanced significantly to allow business users to leverage tools such as Visio for workflow modeling and visualization. Another major improvement is the ability to modify out-of-the-box SharePoint workflows (e.g. approval workflow, three-state workflow).

      The workflow host within SharePoint 2010 now provides the extensibility option to inject custom services. It is also now possible to kickoff workflows without explicitly associating it to a list item.

      Finally, and from an overall scalability perspective, it is also going to be possible to designate application server nodes as workflow participants. In other words, unlike SharePoint 2007 where each application server node participated in executing the workflows, there will be a way to throttle the workflow execution to a limited set of machines. This allows isolation of services within the farm so system administrators can better manage resources and troubleshoot issues.

     

    • Social Networking Capabilities
    • One can debate the impact and applicability of social networking tools such as Facebook in the workplace, but there is no doubt that information workers today are demanding similar capabilities from the tools they use inside the workplace. This is why the SPS 2010 enhancements in this arena are so noteworthy. In SPS 2010, just about every element such as sites, documents, videos, blog posts are taggable. There is out-of-the-box support for navigating the tag clouds and lists. Users will be able to rate artifacts and recommend it to others. Co-workers can keep up with the latest by tracking the activity data available as a new feed. Individual users will be able to setup profiles, write to a common or personal board, indicate their presence/location and be able to take advantage of micro-blogging capabilities.

     

    • Deployment Advancements
    • Unlike the previous versions, SPS 2010 uses a solution package-based single, consistent scheme for packaging and deploying any customizations. The solution package is a compressed file containing all of the necessary components such as features, web parts, list definitions, assemblies, customized ASPX pages and workflows. Additionally, the previous versions required an extensive amount of manual modification of XML files. In SharePoint 2010, these modifications can be done through point and click configuration with SharePoint Designer 2010.

      SharePoint 2010 also introduces a notion of sandbox solution packages that are designed to improve security and manageability. Sandbox solution packages restrict the scope to which a customization can be applied (For example, restricting the scope for a list customization to site collection). Sandbox solutions are also restricted to a subset of SharePoint programmability options. Finally, administrators can enforce quote limits on resources consumed by sandbox solutions to prevent abuse.

      For a more detail review of the SPS 2010, I encourage you to download the white paper we released recently at http://tinyurl.com/sps2010 ( note – this link takes you directly to the PDF ~1 MB)

    That is my forecast, anyway. Over the last year, I have had the opportunity to talk to a number of customers, user group and conference attendees about Cloud Computing in general and specifically about the Windows Azure Platform. During these conversations, I have come across a number of concerns/ questions about Azure pricing, performance, security and so on. Clearly Windows Azure has a long way to go before it becomes a mature cloud platform. However, some broad-brush statements questioning Azure’s pricing, performance and applicability that I have come across lately, deserve some clarifications.  In this, post I have attempted to capture some of these concerns and provide my humble thoughts on why we need a take a holistic approach when evaluating the Windows Azure Platform.

    Azure – A Platform as a Service offering

    Let me begin by stating that not all cloud offerings are alike.  In their paper, “Toward a Unified Ontology of Cloud Computing” [0] Lamia Youssef et al. provide a detailed model for understanding the different classes of cloud providers. The following diagram depicts their proposed classification of cloud service providers including Platform as a Service (PaaS), Infrastructure as a Service(IaaS) and Software as a Service (SaaS).

    image

    Screen clipping from:  http://www.cs.ucsb.edu/~lyouseff/CCOntology/CloudOntologyPres.pdf

    Microsoft Azure is a “Platform as a Service” offering, which is very different from what Amazon is offering via EC2 (commonly classified as “Infrastructure as a Service”) or even Google’s App Engine (GAE), commonly classified as a specialized “Platform as a Service” offering [1]. As a result of this difference in approach, the lifecycle of a Windows Azure application lifecycle is different as well. As depicted in the figure below – A developer develops the code in the developer fabric running locally and then simply publishes it to the cloud, along with a description of the desired infrastructure model. Based on the uploaded code and the infrastructure model, Azure orchestrates the provisioning of the computing, storage and networking resources in a manner that is fault tolerant and allows for rolling upgrades. It is clear that Microsoft has had to innovate at 100 mph in order to make this concept a reality. I encourage you to read a recently released Microsoft research paper on Helios that describes an OS abstraction across a number of heterogeneous CPUs. Make no mistake; Azure is the only cloud offering that is trying to create a “Cloud OS” abstraction around a fabric of Windows machines. This is not to suggest that EC2 or GAE offerings are not useful. Far from it – If one is looking for an on-demand Virtual Machine provider, EC2 is a great choice. After all, Amazon is the king of retail and as its executives like to say with a grin “we will gladly match the lowest price offered by our competitors”. On the other hand, if you have a Java or Python based web application that can fit GAE pre-defined application structure and framework  – GAE would be a great choice. GAE is already successful with over 45,000 applications by the last count. In the end, different requirements will require the use of different types of cloud offerings.

     

    clip_image001[4]

        1) Windows Azure manages applications not just servers

        2) Tell it what you want, and it will automate the details

       3) Model-driven automation

       4) Platform insures service isolation

     

     

     

     

     

    Now that we have an understanding of the different flavors of cloud computing, let us circle back to where I started and address some of the broad-brush statements, I alluded to earlier. 

    Concern #1: “The Car Analogy  -  Azure Pricing model is fundamentally broken”

    Azure is like a car that has its engine running constantly, even when it is not being driven “.

    The author of the above analogy is trying to contrast Azure pricing with GAE where you only pay for the time you use the service.

    Azure is like a car that has its hood sealed shut. One cannot look what is under the hood, leave alone reconfigure it“.

    The author of above analogy is trying to highlight the fact that one cannot directly access the virtualized instance that is running the Azure based application.

    I think each of the above analogy only conveys a half-truth about Azure. I think a better analogy for Azure is this:

    Azure is a like a car you rent from ZipCar  (a hourly car rental company)

    The price of the ZipCar rental includes gas and insurance cost. Of course, you pay for the hour whether you use the car or not (ZipCar has to spend money to honor a reservation you have made). But at any time, one can simply return the car and incur no additional charges. Would someone care to look under the hood of a ZipCar – I certainly would not. This is against the very reason I want to get a car from ZipCar in the first place – once I pay for the hourly rental I don’t have to worry about insurance or fuel. If something goes wrong, I can simply wave my ZipCar card and get into the next available car. And if for any reasons, my co-passengers want to go on a little excursion – they can, on a short notice, get additional cars at the same fixed price and give them up when they are done with them.

    So how can GAE offer a model that requires the customers to pay only for the time their application gets used, when EC2 and Azure cannot? The answer lies in the figure below – as you want more control the economy of scale goes down.  Azure and EC2 give you more control (in that order) and as a result, cannot take advantage of the economy of scale at the same level as the GAE. GAE, Azure and EC2 are each offering a multitenant platform but the degree to which each the tenant can configure their individual setup is different. I will not be surprised if an ISV were to come out with an Azure based cloud offering that is constrained in some way (for example, it allow customers to submit their HttpHandlers ) but in turn, offers a usage based pricing model.

    imagel

    There are two other pricing related concerns that I have come across.

    “I can host 2 blogs and a wiki for the price of a single Azure Web role“.

    Hosting a blog or wiki is not likely the target application for Azure Web Role. If you want to host a blog or wiki, Business Productivity Online Suite (BPOS) [5] offering will be a more cost effective.

    “My hoster allows me to create multiple virtual directories (vdir) inside a single VM. With Azure, I need multiple Web Roles to achieve the same setup“.

    First off, it is not appropriate to compare a vdir offered by a VM where you have completed access to the machine (and in turn assume all the responsibility for administering it), with the vdir offered by a web role – web role offers fault tolerance, monitoring, load balancing and other SLA guarantees. While these differences are important, in the end, it is really about raising the level of abstraction. Readers will recall that when IIS 6 came out with the default out-of-process model, there were some concerns that provisioning a distinct process for each web application would be expensive (as opposed to an in-process or pooled process model supported by IIS 5.0). But the isolation and fault tolerance offered by an out of process model quickly sidelined any concerns about the cost of provisioning additional processes.

     

    Concern #2: “Azure does not scale dynamically”

    This concern is based on the fact that one has to specify the exact number of Azure role instances that one wishes to provision. In the current CTP, there is no way to specify a range for the role instance. In other words, letting Azure determine when it is appropriate to ramp up or ramp down the number of running instances, automatically based on the load on the application.

    Frankly, this is not as big a deal as it sounds. One can always write a small piece of code that spawns new instances (or conversely, shuts down unused ones) based on the health alerts. Also note that the underlying fabric does indeed support providing a range (refer to the PDC 08 session [2]  “Under the Hood: Inside the Windows Azure Hosting” – fast forward to minute 67:14). I think that it is a matter of time before Microsoft exposes this capability to the end users.

    Concern #3: “Azure makes sense only for the large enterprises”

    “The massive scalability, fault tolerance and high availability are well suited for large enterprises. Most of the applications we build as a small/medium business don’t need these capabilities”.

    Let us consider a small/medium sized web retailer. Based on the available, Azure pricing it would typically cost ~$360/month to run (refer to the cost breakup provided below) a small /medium web site – all without any upfront investment in staging / production / disaster recovery hardware or software costs. 

     

    Resource Description Cost
    Compute 2 web roles to get the 99.95% uptime SLA $172.80 (30*24*.12*2)
    Storage (SQL) 10 GB $99.99
    Storage Transactions 1 Million Transactions $1
    Messages 1 Million Transactions $1.50
    Bandwidth In 50 GB $10
    Bandwidth Out 100 GB $75
    Total   $360.29 / month

     

    To compare the above cost with a hosted/on-premise solution, I checked with one of our customers on how much they were paying to host their servers in a data center in Herndon, VA. Here is the information I was provided:

    Resource Single windows server machine
    What is covered Network, Power, Cabling
      Tape Backup, Config and handling
      Firewall, F-5 Network switch
      Remote hands support
      Physical security and intrusion detection
      AD Servers / Exchange/ SMTP Access
    What is not covered Applying patches, failure detection.
      Cost of the hardware and software
    Cost ~$150.00 /month

    Furthermore, even small retailers are subject to compliance with the security and auditing standards such as Payment Card Industry – Data Security Standard (PCI-DSS) and Statement on Auditing Standards (SAS 70). By hosting their application in the Microsoft’s cloud infrastructure, even small retailers can take advantage of security certifications that Microsoft has earned including SAS 70 and ISO/IEC 27001-2005 certifications [4].

    Let us consider another example – I am working with a customer who has a very limited budget for hosting a small HR application in Canada, because of the local laws, this application must be hosted within geographical boundaries of Canada – unfortunately they have a very small office in Canada with limited or no data center capabilities.  At first glance, the option to purchase a small Windows box and placing it in their office in Canada seemed most cost effective. But once they added up the cost to manage the setup, it was clear that total cost of ownership would be not be very cost effective. This customer is now looking at hosting their application in Azure – hoping to take advantage of Geo-Location guarantees it offers. The key point to take away from this example is this – by taking advantage of the global data centers, even small to medium businesses can achieve a world-wide reach.

    One final point about pricing, it is ultimately the free market model that will determine where the PaaS pricing lands. As is evident from the major mid-course correction regarding the SQL Data Services offering, Microsoft is going to have to listen to and adjust, based on customer and market demands. I also expect that Microsoft would offer special promotions initially(for MSDN users, small businesses and startups – similar to the BizSpark program) to build the momentum.

    So, do the cost estimates for Azure put it out of reach of most small/medium businesses, I will let you be the judge. 

    Concern #4: “I am out if there is no Remote Desktop Access”

    This is probably the most often requested feature. It is easy to see how remote desktop access would be helpful; it allow access to the desktop, install software components, custom configuration – registry tweaks etc. While remote desktop may very well be supported in the future, keep in mind that the tradeoff will be that customers would be required to assume the responsibility for additional administrative tasks including patch management and access control.

    Once again the underlying fabric may already support this capability. During his PDC 08 Manuvir Das talked about an “escape hatch” or raw mode as something that is available under the covers.

    image

     

    Screen clipping taken from http://channel9.msdn.com/pdc2008/ES16/ Fast forward to the 21.44 minute mark in the presentation.

    Concern #5: “You cannot seamlessly move your Azure application back to the datacenter”

    The vendor lock-in argument against Azure is that once you build an application for Azure, it will be hard to move it back to an on-premise and hosted data center.  Frankly, vendor lock-in is a concern with any cloud provider. As I heard David Chappell say recently – “there is no lock-in like the cloud. Cloud provider can turn the spigot of the service off, rather abruptly” ( I have paraphrased his quote a bit as I cannot recall his exact words).

    Clearly, trust in cloud computing can only build over time. And hopefully over time, through good experiences, concerns over vendor lock-in will diminish as well. 

    If we take a closer look at what it takes to build an Azure hosted application, the lock-in concerns do not seem as grave.

    For instance, it is encouraging to see that Microsoft along with IBM, Rackspace and others have agreed to join the Simple API [3] project by Zend Technologies that provides a common File Storage, Document Storage and Simple Queues API  against a cross-section of vendor offerings including Rackspace Cloud Files, Windows Azure Storage, Amazon S3 and Nirvanix.

    The Windows Azure team has also doubled-down on the more open RESTful web API, thus making it easier for non .NET applications to consume it. The Azure team has also recently unveiled the native mode execution capability that lifts the restriction on running applications that are based on managed .NET code only. This means that it is now possible to host application built using Python/ CGI on Azure.

    As I have stated earlier, Azure development is squarely based on .NET building blocks WCF/WF/ASPX and so on. So fundamentally you are writing .NET code – it is important to understand that the only new API introduced with Azure – called the Service Runtime API – fits in one screen (shown below). So there is not going to be a large body of Azure specific syntax that needs to be ported, should you decide to move the application back to your premises.

     

     Service Runtime

     

    While the code you write for an Azure application is all .NET based, there are a number of guidelines [8] that one will need to adhere to, for a successful implementation on Azure.

    · Develop for the sandbox

    · Prefer scale out over scale up

    · Separate out the state from UI ; store the state in a distributed horizontal storage

    · Be loosely coupled  

    · Be prepared to handle varying loads

    · Deal with failures; build retry logic that is idempotent

    · Rolling upgrades – upgrade your application without any downtime

    · Rely on unified logging ; build an alert mechanism

      image

      Fortunately most of the above bullets are the guidelines that the industry as a whole has been chasing over the last decade. So to the extent we can inculcate these in our design, our applications will better off – whether we choose to host them in the cloud or not.  I would even go to the extent of saying that cloud computing done right is a great hope for all of the industry.

      Concern #6: “Azure is slow compared to EC2 or GAE”

      How would a performance of a Web application hosted inside Azure compare with a similar application developed using EC2 or GAE? First off, this is an apples-to-oranges comparison. With EC2 you can ask for a specific hardware configuration (for example, you have the option to select an EC2 instance that has 8 virtual cores with 2.5 EC2 compute units each [7]). Azure offers full relational capability that EC2 and GAE don’t. Azure offers a batch processing capability that GAE does not. I could go on and on but you get the idea. These are three different approaches to cloud computing and offer distinct performance optimization strategies.

      Closing

      In closing, let me say this – I would be pollyannaish to suggest that Azure is a perfect cloud offering. Only time will tell how successful Azure is.  After building a handful of Azure applications ( mainly POCs), I, like many others, have run into some challenges with the current CTP – Here are a few examples of the kinds of problems I am talking about  – Provisioning ( I am not talking about new applications, even upgrading applications where the model is unchanged ) is  very slow. The slowness in provisioning further exacerbates the fact that the provisioning is very coarse grained – even if you had to change a single master page, code behind or CSS file, one has to go through the slow process of upgrading my application. Another example would be the pricing of .NET Service bus – based on the information available so far, it is very hard to develop a pricing model for a complex app – how do you differentiate between TCP connection vs. streaming connection, what does a transaction mean exactly? The diagnostics support is rudimentary – all you have are the logs (no access to the event logs, server logs etc.). Furthermore, it takes several minutes to copy them to the blob storage. Finally, there is only support for two types of roles – Web and Worker. What if I wanted to add on specific software (for example Excel Services)? Or, what if I wanted to run .NET 2.0 or even a .NET 4.0 based applications? (Note that a number of enhancements to the logging functionality have been announced very recently [6] including ability to look at the performance counters and auto copying of logs on a periodic basis)

      The Azure team has heard the above feedback many times and has their work cut out for them. They are working hard to alleviate some of aforementioned challenges by the time Azure is launched commercially in November and in subsequent releases. And for the rest of us, who want to build cloud application using the .NET building blocks we know and the VS.NET based tools we love, work is cut out for us as well – brainstorm about the applications that can leverage the Azure platform, build prototypes, provide feedback on the pricing models, develop tools & utilities and last but not the least, demand the best product possible.

      References

      [0] Towards a Unified Ontology of Cloud Computing – http://www.cs.ucsb.edu/~lyouseff/CCOntology/CloudOntology.pdf

      [1] Above the Clouds: A Berkeley View of Cloud Computing http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf

      [2] Dynamic Scaling – http://channel9.msdn.com/pdc2008/ES19/

      [3] Simple API – http://www.simplecloud.org/

      [4] Securing Microsoft’s Cloud Infrastructure – http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf

      [5] Business Productivity Online Suite – http://www.microsoft.com/resources/Technet/en-us/MSOnline/bpos/html/99d9ede5-ce15-476c-9a3f-d42a481d287e.htm

      [6] Azure team recently announced the following enhancements to Windows Azure Logging – http://blogs.msdn.com/windowsazure/archive/2009/10/03/upcoming-changes-to-windows-azure-logging.aspx

      [7] Amazon EC2 Instance Types – http://aws.amazon.com/ec2/instance-types/

      [8] Windows Azure: Cloud Service Development Best Practices - http://channel9.msdn.com/pdc2008/ES03/

      SharePoint Database Tips

      August 6, 2009

      Recently, I have been working on a number of SharePoint database issues related to large number of workflow instances.  Here are a few things I encountered:

       

      • System.Runtime.InteropServices.COMException (0×80004005) is a catch all for all database and other interop errors  – You can get this error for any number of reasons – for example out of transaction log space, deadlocks etc. I have seen a number of posts on the web attributing this error to one condition or another.

       

      • Turn on WF tracing (see my previous post for more information). Trace logs provide a great source of information on how a given workflow instance has progressed. One can find the workflow instance that encountered an error. Once you have the workflow instance, you can query the workflow and workflow association tables to determine the related SPWeb, SPTask etc.

       

      • Because SharePoint uses GUID as a the primary key it can result in a fragmentation (Kimblerly Tripp has a nice post about this) that one needs to be aware of.

       

      • We found that despite the weekly timer job ( note that there is a difference between SP1 and SP2)statistics on certain tables can be off, resulting in less than optimal plans, which in turn  can cause deadlocks in some instances.

       

       

      • Check if max degrees of parallelism is having an impact? Try re-running your tests after MAXDOP to 1.

       

      • Look for lock escalation warnings – Try altering the locking threshold using the trace flags (1211, 1224).

       

      • Review your code for potentially expensive WSS OM calls such as BreakRoleInheritance. Perform static analysis using tool such as SPDisposeCheck to make sure resources are being correctly disposed.

       

      • Because of their episodic nature, workflows are difficult to automate the testing of workflows – There is a nice MSDN  article on unit testing workflows.

      Load testing workflows is even harder. We developed a tool that allowed us to load test our workflows. This tool is itself a workflow program that drives a configured number of workflows – you basically supply the workflow types, users, how the tasks need to be updated (accepted, rejected etc ) and in what order. Having a load test tool allowed us to consistently recreate some of the issues we were encountering in production.

       

      • Active management of lists is very important. Archive old items from lists regularly as appropriate.

      URL http://ais.cloudapp.net                                                        

      Source Code  http://cid-818700175481d002.skydrive.live.com/browse.aspx/Blog

      I (along with Harin Sandhoo from AIS) recently worked on converting a subset of DinnerNow.net application to Azure. This post captures some of highlights of the porting effort.

      DinnerNow.net is a reference application developed by Microsoft to showcase .NET 3.5 functionality. This application  can be broken up into three subsystems:

      1. Web site where customers can order food from a variety of restaurants in their local delivery area.

      2. Smart client application that allows a restaurant manager to view the incoming orders and update their status.

      3. Mobile application that allows a delivery person to be notified when orders are ready for delivery.

      It made sense to start with porting the web site to Azure. In future, it may be interesting to look into building a Silverlight equivalent for the restaurant manager piece and perhaps, a live mesh based mobile delivery application.

      Current Architecture 

      DinnerNow.net web site is implemented as ASP.NET application hosted inside IIS 7. Data is currently stored in SQL 2005, a LINQ to SQL based data access layer is used to persist the data. Business service functionality is using WCF and workflow service (also hosted inside the IIS 7)

       

      DinnerNow

       

       

       

       

       

       

       

       

       

      [ Readers who are not already familiar with Azure concepts such as roles, may find it useful to review this first ]

      Proposed Architecture   image

       

       

      The Azure version of DinnerNow.net utilizes the web role to host ASP.NET code as well as the WCF services. For now (because of the limits on the # of projects under the CTP) the UI code and WCF endpoints are hosted within the same web role. In the future, it will make sense to move the WCF endpoints to a separate web role. An Azure Table based membership provider ( part of the Azure SDK sample) is used for authentication.

      A worker role is used for some background processing tasks such as storing the order into the database. The communication between the web and worker role takes place via the Azure Queue. The worker role is also responsible for communicating with the workflow program (ProcessOrder.xoml) that places the submitted order to a queue for further processing by the restaurants. As of July 1st, Azure Workflow Service has been taken down.  As a result, we have taken out the workflow service related code. The key motivations for including Azure Workflow Service, in the first place, included 1) a robust host for the workflow program 2) connectivity to applications inside the enterprise (such as the restaurant manager application)

      The biggest challenge in porting the application was related to the database. Even though the relational features for SDS have been announced, they are not yet available. This forced us to rely on Windows Azure Table as the persistence store. The key consideration in moving the data from relational DB to Azure Table is the partitioning strategy. Consider the following the dbml diagram depicting Order, OrderDetail and OrderPayment, LINQ to SQL classes. Azure Table supports a flexible schema  that allows entities of different types to be stored within a single table. Since there was a need to retrieve Order and OrderDetails together,  we clustered the two together inside a single table for efficient retrieval. We partitioned the data according to order id and the RowKey was used to differentiate between the entities (i.e. “OrderItem_”+ DateTime , “Order_” + DateTime). Since the sorting is lexicographic, we use fixed length format based on Ticks.

       

      clip_image001

      clip_image001[9]

       

      Azure Table supports ADO.NET Data Services and REST. Fortunately, the data access code is not significantly different.  So for example, the existing LINQ to SQL query inside the GetOrdersForRestaurant method that looks like this:

      var ordersByRestaurant = (from o in db.Orders.Distinct()
                                 where (from od in db.OrderDetails where od.RestaurantId == restaurantId select od.OrderId).Contains(o.OrderId)
                                 select new DinnerNow.Business.Data.Order()

      is changed to the following query when working with Azure Table:

      var qResult = (from oItems in _context.CreateQuery<OrderItemEntity>(OrderTableName)
                         where oItems.RestaurantId == restaurantId
                         select oItems);

      For additional details please refer to the file OrderProcessing.cs in the sample code that accompanies this blog post.

      Code View

      The following diagram depicts the code view for the project. DinnerNow.CloudService is the Azure Service project that comprises of the web and worker role. The web role is mapped to the DinnerNow.WebUX project. Similarly the worker role is mapped to the DinnerNow.WorkerRole.

      image

       

      Miscellanea

      • It is recommended that configuration data be stored inside the CSCFG file (as opposed to web.config). This is because CSCFG file is stored outside the application package you upload to the Azure Portal – The application package is really a diff disk that gets applied to the base HyperV image. By storing the configuration data in the CSCFG file means that you can make changes without the need to upload a new  application page. To read config setting from the CSCFG file use GetConfigurationSetting method of the RoleManage class. Like so:

      var sslPort = Microsoft.ServiceHosting.ServiceRuntime.RoleManager.GetConfigurationSetting(“sslport”);

      • Our code is based SDK March CTP.
      • ASP.NET code is running in  full trust (now available with the March CTP).
      • Testing – We mostly relied on the mixed mode testing – wherein the code was executed in the developer fabric but the data was in the Azure Table.
      • We used Cerebrata’s nifty cloud storage tool for  all our testing – https://onlinedemo.cerebrata.com/Cerebrata.CloudStorage/default.aspx

      This blog post is about tips and tricks for monitoring the health of SharePoint workflows. I will like to suggest  this excellent MSDN article for additional information.

      Logging information about the progress of workflows

      The WF tracking service logs the events as the workflow progresses along.  For example, consider a simple workflow (see below) that creates a task and then loops until the task is 100% complete.

       

       

      clip_image002

       

       

      By turning the tracking service on (see [1]) we can capture entries like the following, the trace output below corresponds to the workflow in Figure 1 above:

       

      System.Workflow.Runtime.Hosting Information: 0 : Creating instance 1888f8e6-145c-4220-be52-99cfd09098a7

       

      System.Workflow.Runtime Information: 1 : Workflow Runtime: Scheduler: InstanceId: 1888f8e6-145c-4220-be52-99cfd09098a7 : Running scheduled entry: SubscriptionEvent((1)Workflow1, ActivityStatusChange(‘(1)createTask1‘, Closed, Succeeded))

       

      System.Workflow.Runtime Information: 0 : Activity Status Change – Activity: whileActivity1 Old:Initialized; New:Executing

       

      // At this point the workflow is waiting for the user to update the task. So the workflow runtime can hydrate the running instance to the database

       

      System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Got an unload request for instance 1888f8e6-145c-4220-be52-99cfd09098a7

      System.Workflow.Runtime Information: 0 : 1888f8e6-145c-4220-be52-99cfd09098a7: Calling PerformUnloading(false) on instance 1888f8e6-145c-4220-be52-99cfd09098a7 hc 13970169

      System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Unloading instance 1888f8e6-145c-4220-be52-99cfd09098a7

      System.Workflow.Runtime.Hosting Information: 0 : TimerEventSubscriptionQueue: 1888f8e6-145c-4220-be52-99cfd09098a7 Suspend

      System.Workflow.Runtime Information: 0 : 1888f8e6-145c-4220-be52-99cfd09098a7: Calling Persist

       

      // At this point the user marks the task as complete . So the workflow runtime can deserialize the workflow and pass it the TaskChanged event. This results in re-evaluation of the while loop

       

      System.Workflow.Runtime Stop: 0 : Workflow Trace

      System.Workflow.Runtime.Hosting Information: 0 : Deserialized a Workflow1 [SampleWorkflow.Workflow1] to length 8660. Took 00:00:00.0400576.

      System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Loading instance 1888f8e6-145c-4220-be52-99cfd09098a7

       

      System.Workflow.Runtime Information: 0 : Activity Status Change – Activity: onTaskChanged1 Old:Executing; New:Closed

      System.Workflow.Runtime Information: 1 : Workflow Runtime: Scheduler: InstanceId:

       

      1888f8e6-145c-4220-be52-99cfd09098a7 : Scheduling entry: SubscriptionEvent((1)whileActivity1, ActivityStatusChange(‘(2)onTaskChanged1′, Closed, Succeeded))

       

       

      As you can see from the snippets above, we have information about step by step execution of the workflow.

       

      Workflow Failure Conditions

      Some of the common reasons why a workflow can fail include:

       

      a.      The WF program instance errors out because of an exception. For example, if a null object reference is encountered inside the CreateTask handler, in the workflow in Figure 1, above. This exception will cause the WF instance to move to the error state (reflected in the status column of the document library)

       

      clip_image004

       

       

      The tracking service will record this exception and typically log detailed information as shown below:

       

      System.Workflow.Runtime Critical: 0 : Uncaught exception escaped to the root of the workflow.

          In instance 9e644d58-9990-443a-a595-1685fec2c311 in activity

      Inner exception: System.NullReferenceException: Object reference not set to an instance of an object.

         at SampleWorkflow.Workflow1.TaskCreation(Object sender, EventArgs e)

         at System.Workflow.ComponentModel.Activity.RaiseEvent(DependencyProperty dependencyEvent, Object sender, EventArgs e)

         at System.Workflow.Activities.CallExternalMethodActivity.Execute(ActivityExecutionContext executionContext)

         at System.Workflow.ComponentModel.ActivityExecutor`1.Execute(T activity, ActivityExecutionContext executionContext)

         at System.Workflow.ComponentModel.ActivityExecutor`1.Execute(Activity activity, ActivityExecutionContext executionContext)

         at System.Workflow.ComponentModel.ActivityExecutorOperation.Run(IWorkflowCoreRuntime workflowCoreRuntime)

         at System.Workflow.Runtime.Scheduler.Run()

       

      System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Terminating instance 9e644d58-9990-443a-a595-1685fec2c311

       

      One more point about exception conditions – there are times when we need to thrown an exception ourselves. For instance, if the user does not have appropriate permission, or there is missing data. In those cases, we could set a custom error message (See [2])

       

      clip_image005

       

       

       

       

      b.      The Application pool is recycled.  The state from a previous persist point (if there was one – typically a delay activity or OnTaskChanged activity) is persisted in the database, but since there is no retry mechanism, there is no way to re-start the persisted workflow instance. For example, if the sample WF program instance was executing inside the While activity (whileActivity1 in Figure 1) when the app pool crashed, there is no automatic way to have the workflow restarted.

      One potential solution would be to model the workflow as a state machine and include retry logic. But this would add complexity to the workflow.

      c.       If the correlation token is being set dynamically, there is a chance that the value gets incorrectly set in some cases. As a result, the waiting WF program instance will never receive the event. A correlation token is an identifier WF uses to tie activities to a common task – for example if CreateTask, OnTaskChanged and CompleteTask relate to a single task; they should have the same correlation token.

      d.      SharePoint workflow activities (like Create Task) delay database commits until a persist point is reached. This means that a CreateTask activity will not result in a “real” task being added to the list until a persist point is reached. So any direct SharePoint OM calls that attempt to reference the created task will fail until the point a persist point is reached. Please refer to [3] to dump out the

       [1] Workflow Diagnostics

       

      Add the following section to the web.config:

       

      <system.diagnostics>

             <switches>

                    <add name=System.Workflow LogToTraceListeners value=1 />

                    <add name=System.Workflow.Runtime.Hosting value=All />

                    <add name=System.Workflow.Runtime value=All />

                    <add name=System.Workflow.Runtime.Tracking value=All />

                    <add name=System.Workflow.Activities value=All />

             </switches>

             <trace autoflush=true indentsize=4>

                    <listeners>

                           <add name=customListener

                     type=System.Diagnostics.TextWriterTraceListener

                     initializeData=WFTrace.log />

                    </listeners>

             </trace>

      </system.diagnostics>

       

       

      Additionally, we can use stsadm to capture the trace messages from Workflow Infrstructure as shown below:

       

      @echo off

      set SPAdminTool=%CommonProgramFiles%\Microsoft Shared\web server extensions\12\BIN\stsadm.exe

       

      rem echo Logging levels before…

      rem “%SPAdminTool%” -o listlogginglevels

       

      echo Setting levels…

      stsadm -o setlogginglevel -category “Workflow Features;Workflow Infrastructure” -tracelevel Verbose -windowslogginglevel Error

       

      echo Restarting SPTrace service…

      net stop sptrace

      net start sptrace

       

      rem echo Logging levels after…

      rem “%SPAdminTool%” -o listlogginglevels

       

      pause

       

       

       

      [2] Adding custom status message

       

      1.       Add a custom status in workflow.xml

      <ExtendedStatusColumnValues>

             <StatusColumnValue>

                    Failed to start due to insufficient permissions

             </StatusColumnValue>

      </ExtendedStatusColumnValues> 

       

      2.       Add the following code for the invoking method

       private void setState1_MethodInvoking(object sender, EventArgs e)

      {

            ((Microsoft.SharePoint.WorkflowActions.SetState)sender).State = ((Int32)SPWorkflowStatus.Max);

      }

       

       [3] Code to extract the persisted workflow state

       

      Since the workflow runtime is finicky about changes such as adding private variables, it is useful to dump the persisted state.

       

      System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand(

             “select InstanceData from dbo.workflow  where  InstanceDataSize > 0″);

      System.Data.SqlClient.SqlConnection conn = new System.Data.SqlClient.SqlConnection(

             “Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=WSS_Content_Portal;Data Source=CTSDEV1″);

      conn.Open();

      cmd.Connection = conn;

      byte[] image = (byte[])cmd.ExecuteScalar();

      System.IO.FileStream fs = new System.IO.FileStream(@”c:\data.gz”, System.IO.FileMode.CreateNew);

      fs.Write(image, 0, image.Length);

      fs.Flush();

      fs.Close(); 

       

       

       

       

       

      Follow

      Get every new post delivered to your Inbox.