URL http://ais.cloudapp.net                                                        

Source Code  http://cid-818700175481d002.skydrive.live.com/browse.aspx/Blog

I (along with Harin Sandhoo from AIS) recently worked on converting a subset of DinnerNow.net application to Azure. This post captures some of highlights of the porting effort.

DinnerNow.net is a reference application developed by Microsoft to showcase .NET 3.5 functionality. This application  can be broken up into three subsystems:

1. Web site where customers can order food from a variety of restaurants in their local delivery area.

2. Smart client application that allows a restaurant manager to view the incoming orders and update their status.

3. Mobile application that allows a delivery person to be notified when orders are ready for delivery.

It made sense to start with porting the web site to Azure. In future, it may be interesting to look into building a Silverlight equivalent for the restaurant manager piece and perhaps, a live mesh based mobile delivery application.

Current Architecture 

DinnerNow.net web site is implemented as ASP.NET application hosted inside IIS 7. Data is currently stored in SQL 2005, a LINQ to SQL based data access layer is used to persist the data. Business service functionality is using WCF and workflow service (also hosted inside the IIS 7)

 

DinnerNow

 

 

 

 

 

 

 

 

 

[ Readers who are not already familiar with Azure concepts such as roles, may find it useful to review this first ]

Proposed Architecture   image

 

 

The Azure version of DinnerNow.net utilizes the web role to host ASP.NET code as well as the WCF services. For now (because of the limits on the # of projects under the CTP) the UI code and WCF endpoints are hosted within the same web role. In the future, it will make sense to move the WCF endpoints to a separate web role. An Azure Table based membership provider ( part of the Azure SDK sample) is used for authentication.

A worker role is used for some background processing tasks such as storing the order into the database. The communication between the web and worker role takes place via the Azure Queue. The worker role is also responsible for communicating with the workflow program (ProcessOrder.xoml) that places the submitted order to a queue for further processing by the restaurants. As of July 1st, Azure Workflow Service has been taken down.  As a result, we have taken out the workflow service related code. The key motivations for including Azure Workflow Service, in the first place, included 1) a robust host for the workflow program 2) connectivity to applications inside the enterprise (such as the restaurant manager application)

The biggest challenge in porting the application was related to the database. Even though the relational features for SDS have been announced, they are not yet available. This forced us to rely on Windows Azure Table as the persistence store. The key consideration in moving the data from relational DB to Azure Table is the partitioning strategy. Consider the following the dbml diagram depicting Order, OrderDetail and OrderPayment, LINQ to SQL classes. Azure Table supports a flexible schema  that allows entities of different types to be stored within a single table. Since there was a need to retrieve Order and OrderDetails together,  we clustered the two together inside a single table for efficient retrieval. We partitioned the data according to order id and the RowKey was used to differentiate between the entities (i.e. “OrderItem_”+ DateTime , “Order_” + DateTime). Since the sorting is lexicographic, we use fixed length format based on Ticks.

 

clip_image001

clip_image001[9]

 

Azure Table supports ADO.NET Data Services and REST. Fortunately, the data access code is not significantly different.  So for example, the existing LINQ to SQL query inside the GetOrdersForRestaurant method that looks like this:

var ordersByRestaurant = (from o in db.Orders.Distinct()
                           where (from od in db.OrderDetails where od.RestaurantId == restaurantId select od.OrderId).Contains(o.OrderId)
                           select new DinnerNow.Business.Data.Order()

is changed to the following query when working with Azure Table:

var qResult = (from oItems in _context.CreateQuery<OrderItemEntity>(OrderTableName)
                   where oItems.RestaurantId == restaurantId
                   select oItems);

For additional details please refer to the file OrderProcessing.cs in the sample code that accompanies this blog post.

Code View

The following diagram depicts the code view for the project. DinnerNow.CloudService is the Azure Service project that comprises of the web and worker role. The web role is mapped to the DinnerNow.WebUX project. Similarly the worker role is mapped to the DinnerNow.WorkerRole.

image

 

Miscellanea

  • It is recommended that configuration data be stored inside the CSCFG file (as opposed to web.config). This is because CSCFG file is stored outside the application package you upload to the Azure Portal – The application package is really a diff disk that gets applied to the base HyperV image. By storing the configuration data in the CSCFG file means that you can make changes without the need to upload a new  application page. To read config setting from the CSCFG file use GetConfigurationSetting method of the RoleManage class. Like so:

var sslPort = Microsoft.ServiceHosting.ServiceRuntime.RoleManager.GetConfigurationSetting(”sslport”);

  • Our code is based SDK March CTP.
  • ASP.NET code is running in  full trust (now available with the March CTP).
  • Testing – We mostly relied on the mixed mode testing – wherein the code was executed in the developer fabric but the data was in the Azure Table.
  • We used Cerebrata’s nifty cloud storage tool for  all our testing – https://onlinedemo.cerebrata.com/Cerebrata.CloudStorage/default.aspx

This blog post is about tips and tricks for monitoring the health of SharePoint workflows. I will like to suggest  this excellent MSDN article for additional information.

Logging information about the progress of workflows

The WF tracking service logs the events as the workflow progresses along.  For example, consider a simple workflow (see below) that creates a task and then loops until the task is 100% complete.

 

 

clip_image002

 

 

By turning the tracking service on (see [1]) we can capture entries like the following, the trace output below corresponds to the workflow in Figure 1 above:

 

System.Workflow.Runtime.Hosting Information: 0 : Creating instance 1888f8e6-145c-4220-be52-99cfd09098a7

 

System.Workflow.Runtime Information: 1 : Workflow Runtime: Scheduler: InstanceId: 1888f8e6-145c-4220-be52-99cfd09098a7 : Running scheduled entry: SubscriptionEvent((1)Workflow1, ActivityStatusChange(’(1)createTask1‘, Closed, Succeeded))

 

System.Workflow.Runtime Information: 0 : Activity Status Change – Activity: whileActivity1 Old:Initialized; New:Executing

 

// At this point the workflow is waiting for the user to update the task. So the workflow runtime can hydrate the running instance to the database

 

System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Got an unload request for instance 1888f8e6-145c-4220-be52-99cfd09098a7

System.Workflow.Runtime Information: 0 : 1888f8e6-145c-4220-be52-99cfd09098a7: Calling PerformUnloading(false) on instance 1888f8e6-145c-4220-be52-99cfd09098a7 hc 13970169

System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Unloading instance 1888f8e6-145c-4220-be52-99cfd09098a7

System.Workflow.Runtime.Hosting Information: 0 : TimerEventSubscriptionQueue: 1888f8e6-145c-4220-be52-99cfd09098a7 Suspend

System.Workflow.Runtime Information: 0 : 1888f8e6-145c-4220-be52-99cfd09098a7: Calling Persist

 

// At this point the user marks the task as complete . So the workflow runtime can deserialize the workflow and pass it the TaskChanged event. This results in re-evaluation of the while loop

 

System.Workflow.Runtime Stop: 0 : Workflow Trace

System.Workflow.Runtime.Hosting Information: 0 : Deserialized a Workflow1 [SampleWorkflow.Workflow1] to length 8660. Took 00:00:00.0400576.

System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Loading instance 1888f8e6-145c-4220-be52-99cfd09098a7

 

System.Workflow.Runtime Information: 0 : Activity Status Change – Activity: onTaskChanged1 Old:Executing; New:Closed

System.Workflow.Runtime Information: 1 : Workflow Runtime: Scheduler: InstanceId:

 

1888f8e6-145c-4220-be52-99cfd09098a7 : Scheduling entry: SubscriptionEvent((1)whileActivity1, ActivityStatusChange(’(2)onTaskChanged1′, Closed, Succeeded))

 

 

As you can see from the snippets above, we have information about step by step execution of the workflow.

 

Workflow Failure Conditions

Some of the common reasons why a workflow can fail include:

 

a.      The WF program instance errors out because of an exception. For example, if a null object reference is encountered inside the CreateTask handler, in the workflow in Figure 1, above. This exception will cause the WF instance to move to the error state (reflected in the status column of the document library)

 

clip_image004

 

 

The tracking service will record this exception and typically log detailed information as shown below:

 

System.Workflow.Runtime Critical: 0 : Uncaught exception escaped to the root of the workflow.

    In instance 9e644d58-9990-443a-a595-1685fec2c311 in activity

Inner exception: System.NullReferenceException: Object reference not set to an instance of an object.

   at SampleWorkflow.Workflow1.TaskCreation(Object sender, EventArgs e)

   at System.Workflow.ComponentModel.Activity.RaiseEvent(DependencyProperty dependencyEvent, Object sender, EventArgs e)

   at System.Workflow.Activities.CallExternalMethodActivity.Execute(ActivityExecutionContext executionContext)

   at System.Workflow.ComponentModel.ActivityExecutor`1.Execute(T activity, ActivityExecutionContext executionContext)

   at System.Workflow.ComponentModel.ActivityExecutor`1.Execute(Activity activity, ActivityExecutionContext executionContext)

   at System.Workflow.ComponentModel.ActivityExecutorOperation.Run(IWorkflowCoreRuntime workflowCoreRuntime)

   at System.Workflow.Runtime.Scheduler.Run()

 

System.Workflow.Runtime Information: 0 : Workflow Runtime: WorkflowExecutor: Terminating instance 9e644d58-9990-443a-a595-1685fec2c311

 

One more point about exception conditions – there are times when we need to thrown an exception ourselves. For instance, if the user does not have appropriate permission, or there is missing data. In those cases, we could set a custom error message (See [2])

 

clip_image005

 

 

 

 

b.      The Application pool is recycled.  The state from a previous persist point (if there was one – typically a delay activity or OnTaskChanged activity) is persisted in the database, but since there is no retry mechanism, there is no way to re-start the persisted workflow instance. For example, if the sample WF program instance was executing inside the While activity (whileActivity1 in Figure 1) when the app pool crashed, there is no automatic way to have the workflow restarted.

One potential solution would be to model the workflow as a state machine and include retry logic. But this would add complexity to the workflow.

c.       If the correlation token is being set dynamically, there is a chance that the value gets incorrectly set in some cases. As a result, the waiting WF program instance will never receive the event. A correlation token is an identifier WF uses to tie activities to a common task – for example if CreateTask, OnTaskChanged and CompleteTask relate to a single task; they should have the same correlation token.

d.      SharePoint workflow activities (like Create Task) delay database commits until a persist point is reached. This means that a CreateTask activity will not result in a “real” task being added to the list until a persist point is reached. So any direct SharePoint OM calls that attempt to reference the created task will fail until the point a persist point is reached. Please refer to [3] to dump out the

 [1] Workflow Diagnostics

 

Add the following section to the web.config:

 

<system.diagnostics>

       <switches>

              <add name=System.Workflow LogToTraceListeners value=1 />

              <add name=System.Workflow.Runtime.Hosting value=All />

              <add name=System.Workflow.Runtime value=All />

              <add name=System.Workflow.Runtime.Tracking value=All />

              <add name=System.Workflow.Activities value=All />

       </switches>

       <trace autoflush=true indentsize=4>

              <listeners>

                     <add name=customListener

               type=System.Diagnostics.TextWriterTraceListener

               initializeData=WFTrace.log />

              </listeners>

       </trace>

</system.diagnostics>

 

 

Additionally, we can use stsadm to capture the trace messages from Workflow Infrstructure as shown below:

 

@echo off

set SPAdminTool=%CommonProgramFiles%\Microsoft Shared\web server extensions\12\BIN\stsadm.exe

 

rem echo Logging levels before…

rem “%SPAdminTool%” -o listlogginglevels

 

echo Setting levels…

stsadm -o setlogginglevel -category “Workflow Features;Workflow Infrastructure” -tracelevel Verbose -windowslogginglevel Error

 

echo Restarting SPTrace service…

net stop sptrace

net start sptrace

 

rem echo Logging levels after…

rem “%SPAdminTool%” -o listlogginglevels

 

pause

 

 

 

[2] Adding custom status message

 

1.       Add a custom status in workflow.xml

<ExtendedStatusColumnValues>

       <StatusColumnValue>

              Failed to start due to insufficient permissions

       </StatusColumnValue>

</ExtendedStatusColumnValues> 

 

2.       Add the following code for the invoking method

 private void setState1_MethodInvoking(object sender, EventArgs e)

{

      ((Microsoft.SharePoint.WorkflowActions.SetState)sender).State = ((Int32)SPWorkflowStatus.Max);

}

 

 [3] Code to extract the persisted workflow state

 

Since the workflow runtime is finicky about changes such as adding private variables, it is useful to dump the persisted state.

 

System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand(

       “select InstanceData from dbo.workflow  where  InstanceDataSize > 0″);

System.Data.SqlClient.SqlConnection conn = new System.Data.SqlClient.SqlConnection(

       “Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=WSS_Content_Portal;Data Source=CTSDEV1″);

conn.Open();

cmd.Connection = conn;

byte[] image = (byte[])cmd.ExecuteScalar();

System.IO.FileStream fs = new System.IO.FileStream(@”c:\data.gz”, System.IO.FileMode.CreateNew);

fs.Write(image, 0, image.Length);

fs.Flush();

fs.Close(); 

 

 

 

 

 

I read Andrew’s column on support for the relational capability within SDS. While it is great to have “change the conn string to move to cloud” capability, I think it is useful to understand how the relational model is enabled by SDS.  Additionally, in my opinion, it will be also be important to understand how the entity based storage (offered via Azure Storage) can be leveraged when designing applications for the cloud.

 

The architecture of the data fabric, that powers SDS, is a node based, scale-out architecture. Let us go over some of the core concepts of the data fabric :

    • Storage Unit – Basic unit of storage that supports the CRUD operations. Example – database row.
    • Consistency Unit – Set of storage units that can be queried and updated in a consistent manner. Example – A set of rows with the same partition keys or even an entire database instance.
    • Failover Unit – Group of consistency units that are guaranteed to be available. SDS replicates the failover units to a replica set to ensure that data is always available.

 

                                                                                   image

 

Here are some things to consider when using the SDS relational model (this is based on my limited understanding, of course. I urge you to view Gopal’s excellent Under the Hood talk for additional details)

    • Consistency is guaranteed inside a single node or consistency unit. This means that if your entire on-premises database instance can fit into one consistency unit, you are OK.  But on the other hand, if you are going to exceed it, you will need to think about partitioning your relational model – Until the time Microsoft adds auto-partitioning functionality on top of the relational model. Note that there is no support for transactions that span across consistency units.
    • Data fabric uses dynamic partitioning to improve performance  i.e. it can move a consistency unit around to spread the load evenly across the cluster of nodes. A single fail-over unit is designed to host one more storage units vs. dedicating the entire fail-over unit to a single consistency unit. This is because a) it easy to re-balance the load by moving the “smaller” consistency units around b) easier/faster to recreate a failed node. When you move the your entire relational DB instance to the cloud as a single consistency unit, chances are that data fabric will need to dedicate the entire fail-over unit to it (to improve performance). This can limit some of the benefits of dynamic data partitioning.
    • Bear in mind that the data fabric is based on scale-out using commodity hardware (typically 1.5 to 1.7 GHz X64 and 1.7 GB of memory – this is based on Chuck Lenzmeier’s talk at the PDC and obviously subject to change).
    • Consider the additional latency cost due to the fact that writes are propagated to a quorum of a replicas.

 

As stated earlier, it might be worthwhile to consider the entity based storage offered by the Azure Storage, for the following reasons:

  • If you need flexible schema or are building analysis focused applications – an entity based storage may be more suitable.
  • Entity based storage forces you to think about partitioning from ground up. This allows for almost linear scaling as additional nodes are added to the mix. 
  • Single index (Partition Key + Row Key )may seem limiting but there are number of ways of get around this including dynamically adjusting the partition key, storing multiple types of entities inside a single table so we can partition the data  using partition key  + entityType.

WCF WSS4J Interop

April 5, 2009

Recently I had to work on an interop scenario where WCF client needed to call a WSS4J service. After a bit of experimentation, I came up with the following configuration.

Here are some of the key settings to note:

Authentication Mode  MutualCertificate

Message Protection Order  SignBeforeEncrypt

SOAP Version  1.1

Algorithm Basic128Rsa15

Message Security Version  WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10

 

<?xml version=1.0 encoding=utf-8 ?>

<configuration>

    <system.serviceModel>

            <client>

                  <endpoint address=http://gp64156.exampleCorp.net:8080/eservices/aService

                behaviorConfiguration=ClientCertBehavior binding=customBinding

                bindingConfiguration=JavaInterop contract=exampleCorp.Proxy.aProfileManager

                name=UserProfileManager>

                        <identity>

                              <dns value=userprofilews />

                        </identity>

                  </endpoint>

            </client>

 

            <bindings>

                  <customBinding>

                        <binding name=JavaInterop>

                              <security defaultAlgorithmSuite=Basic128Rsa15 allowSerializedSigningTokenOnReply=true

                        authenticationMode=MutualCertificate requireDerivedKeys=false

                        securityHeaderLayout=Lax includeTimestamp=false messageProtectionOrder=SignBeforeEncrypt

                        messageSecurityVersion=WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10>

                                    <localClientSettings detectReplays=false />

                              </security>

                              <textMessageEncoding messageVersion=Soap11 />

                              <httpTransport />

                        </binding>

                  </customBinding>

            </bindings>

 

            <behaviors>

            <endpointBehaviors>

                <behavior name=ClientCertBehavior>

                    <clientCredentials>

                        <clientCertificate findValue=CN=client.com />

                        <serviceCertificate>

                            <defaultCertificate findValue=userprofilews storeLocation=LocalMachine

                                storeName=TrustedPeople x509FindType=FindByIssuerName />

                        </serviceCertificate>

                    </clientCredentials>

                </behavior>

            </endpointBehaviors>

        </behaviors>

       

       

    </system.serviceModel>

</configuration>

Reporting on MOSS Data

November 25, 2008

 

Generating reports based on the data contained within SharePoint lists is a common requirement for many projects, and yet, there is no easy or direct way to realize this functionality. Fortunately, extensibility options provided by MOSS and SQL Server Reporting Services (SSRS) can help. This post briefly discusses the various approaches for generating reports on MOSS based data.

SharePoint Storage Basics

Sharepoint provides an abstraction around the underlying data storage (aka Content DB) wherein users don’t interface directly with the content database. Instead, the users work with lists and SharePoint in turn, stores the information appropriately inside the content DB.

To be able to support various lists (lists with varying content types), SharePoint utilizes the universal pattern – where a single database table is used to capture information stored in different lists. A universal table (the userdata view inside the content DB) contains generic columns such as int1, int2, varchar1, varchar2, etc. that are mapped to the fields in a list. The mapping between fields and universal table columns is maintained inside the lists view as shown below.

Here is an example of a field (Title) that is based on an existing site column

<FieldRef Name=”Title” ColName=”nvarchar1″/>

Here is an example of a field (Address) that is defined as part of a list

<Field Type=”Text” DisplayName=”Address” Required=”FALSE” MaxLength=”255″ ID=”{76958f23-d514-4d2a-bb16-373429eb6569}” SourceID=”{45581111-aa7e-4567-9a03-684a7ef051e8}” StaticName=”Address” Name=”Address” ColName=”nvarchar3″ RowOrdinal=”0″/>

The universal table implementation in SharePoint goes a step further, it allows fields to span across multiple rows, if needed e.g. if a custom list has more integer columns than the 16 that are available in a single row.

As you can imagine, the universal pattern is really convenient from a manageability perspective. An alternative design would be to provision a new table for each list or content type. Unfortunately the downside of the universal pattern is pretty obvious as well. For example, SQL Server will lock multiple rows of a list (or folders within a list) even when a single list item is being updated. In an extreme case, the entire table may be locked leading to single-threaded access.

The other obvious disadvantage is the limited ability to tune the performance, for example optimizing the index. With WSS 3.0, we now have the ability to mark a single field within a list as an “indexed” field. Indexed field can lead to significant gains in performance as long as they are used in the where clause of the query.

While it is useful to look under the covers to understand the structure content db, Microsoft does not want users to have any dependencies to the underlying tables (for good reason – the underlying storage format is subject to change without notice!). Instead, they provide a number of different object model (OM) based access schemes to safely access this data. These include

1) Search OM – Allows executing queries against the search engine

2) Query OM – Allows executing CAML queries using classes such as SPQuery amd SPSiteData QueryThis comes in two flavors – class library and web service based OM.

The White Paper in reference [1] below discusses all the OM options and how they perform under load. I recommend reading this paper in its entirety. In a nutshell, SPQuery using indexed column is the best option in terms of performance. This paper also talks about using PortalSiteMapProvider based scheme that uses caching to improve performance. I am ignoring this option for the purposes of this discussion as it does not fit the reporting needs very well.

When to use Content DB as custom application data storage

It should be clear from the previous section that the universal table is not designed to be:

High performing - Because of the limited indexing and locking options it is not designed for a high throughput OLTP style access.

Tunable – Schemes such as partitioning to split very large lists are not available. There is also no ability to tune the default SQL (use custom SPs etc) used by the system.

Aggregation - No ability to aggregate, group information from multiple lists close to the source of data (CAML based aggregation across lists takes place at the application tier which is much more expensive)

So if your application truly requires the above traits, the best option is to store data outside the content DB and use one of the several schemes (webparts, _layout, ASPX based content pages to surface your application within SharePoint.

On the other hand, if you are building an application with modest sizing requirements you can take advantage of the abstraction provided by SharePoint and thus avoid the need for any custom data access code. As an example, we built an invoicing system entirely based on the content DB. Invoices were “split” across site collections to allow the solution to scale-out (as opposed to a scale-up approach)

Generating Reports based on SharePoint Data

Whether you store application data inside the content db or not, there is almost always a requirement to generate reports based on the metadata associated with the document stored in SharePoint. Before we look into various reporting options, let us briefly look at some of the key characteristics of a reporting technology such as SQL Server Reporting Services (SSRS) SSRS can be broken up into three parts (as shown in the figure below)

1) Designer -used to create a report layout

2) Processor – binds report data to the layout

3) Connectivity – to a data source

It is easy to see that the ability to scale reports reporting generation is dependent on how fast the report data source can return the requested data. Relational databases (RDBs) with their innate ability to process set-based operations are obviously very good at this. RDBs are also very good at grouping, sorting and filtering; other common reporting requirements.

SQL Server Reporting Services (SSRS), like many other reporting tools, supports connectivity to various data sources out of the box, including SQL Server, DB2, and Oracle. Again, similar to other reporting tools, SSRS allows custom data sources e.g. Active Directory, Disconnected DataSets etc., to act as the source for report data. The only requirement is that the custom data source support a forward-only stream based access to data.

The other key aspect of reporting is the ease of use in creating a report layout. Designers such as the one included with SSRS make it easy to support report layouts such as cross-tab reports, nested reports etc.

Finally, reporting tools exhibit many advanced capabilities such a caching, slicing (ability to support multiple reports via a single “super” dataset), scheduled report generation, delivery (PDF, Excel, Word etc.) and dynamic reports.

Reporting Options

Let us now look at some different options for generating reports on the SharePoint content metadata

Built-in and custom WebParts

If you are trying to generate a simple report (aggregated task list) based on a single list , it might be sufficient to use one of built-in webparts (for example Content Query Web part – see Reference [2] for detailed instructions).

The key advantage of this approach is that security trimming takes place automatically. Another advantage is that the report is always generated on data that is current. The downside is that manual work is required to create the layout (no report designer type tool is available). Any advanced functionality such as master-detail or nesting requires additional custom coding. Finally, unless a caching scheme is used, each view of the report causes the query to be computed again imposing additional load on the system.

Entity Framework

The ADO.NET Entity Framework is part of VS .NET 2008 SP1. The ADO.NET Entity Framework provides a level of abstraction over a data source. The ADO.NET Entity Framework is based on the Entity Data Model (EDM). Basically the EDM allows users to model the database like the way they designed the database in the first place, probably with an Entity-Relational (ER) diagram. This level of abstraction is important because as more and more application information is being maintained in the database, the relational schema more frequently gets modified for performance and normalization. Those changes can affect the data access code in the application. For example, a product table may become a join across a few tables, but as a programmer I only really only care about what the application defines as a Product entity.

So why is EDM important? It is expected that future version of SSRS works natively with the EDM. Using the EDM we could model a SharePoint list and then use SSRS to generate reports.

The key advantage is that this scheme is that it allows us native, yet safe, access to the SharePoint tables. This is because EDM acts as a firewall between the underlying SharePoint content DB structure and Entity-SQL based queries. If the structure of content DB changes at some point, all we need to do is change the EDM. The Entity-SQL queries are not impacted.

Native access to data stored in content DB means zero latency as well as full fidelity access to item level security.

The downside of this approach is that EDM model needs to be manually defined. The other down side is that tuning options are still not available.

Data Extraction Service and Operational Data Store

Another reporting option is to develop a Data Extraction Service (DES) shared service similar to BDC, except this DES would export the information stored inside SharePoint out to an operation data store (ODS). DES is a generic scheme that works across various list types without the need to develop list specific custom code. It is envisioned that users will upload a BDC style XML configuration. DES will then use the configuration file to export data to an external data source on a scheduled basis. To make the transfer as efficient as possible, only differential updates will be carried over.

Security (like the BDC) will be applied at the entity level and not at the SharePoint item.

This scheme offers the most scalability and flexibility. Once the data is available inside ODS, any SSRS technique can be applied. All the tuning and optimization options provided by SQL Server are available as well. Further, the SharePoint performance is not impacted by reporting (although there will be cost associated with running the DES service for extracting the data from SharePoint into the ODS)

The downside of this approach is the latency of data and effort required to generate the DES metadata.

If this scheme seems useful, we could invest in a tool that generates the DES metadata (similar to the BDC metadata generation tools available in the market today)

Reporting Services Data Extension

SSRS supports a modular architecture designed for extensibility. One such extensibility option is the data processing extension which allows SSRS to connect to data sources and retrieve data.

We have two options here:

1) Develop a custom data extension ourselves, by wrapping the SharePoint Web Service based API. Alternatively, we could have a custom web service return a dataset. SSRS 2005 has the ability to consume a dataset directly. The latter is a bit easier because we don’t have to implement all the data extension ourselves, although the report design experience is not rich as provided by data processing extension.

2) Use a commercial data extension such as the one provided by Enesys. Enesys data extension makes it possible to use SharePoint (MOSS 2007 & WSS V3) lists data for building reports with Microsoft SQL Server 2005 Reporting Services – a custom XML query (with portions of CAML) can be directly embedded into the RDL.

The biggest upside of this approach is that it allows the richness of the SSRS designer to be leveraged.

On the downside, Enesys RS uses SharePoint Web Service OM to extract information from the content DB and is thus subjected to the inherent performance limitations of CAML.

References

[1] White Paper: Working with large lists in Office SharePoint® Server 2007, Steve Peschka, Microsoft: http://technet.microsoft.com/en-us/library/cc262813.aspx

[2] Data Processing Extensions Overview, SQL Server 2008 Books Online, Microsoft,
http://msdn2.microsoft.com/en-us/library/ms152816.aspx?s=11

[3] How to: Customize the Content Query Web Part by using Custom Properties, Microsoft,
http://msdn2.microsoft.com/en-us/library/aa981241.aspx

[4] Enesys RS Data Extension
http://www.enesyssoftware.com/Products/EnesysRSDataExtension/Features/tabid/73/language/en-US/Default.aspx

 

I recently recorded a DNR show on MOSS. During the session, a question about the benefit of using MOSS platform elements such as "List" came up. How can developers leverage these to build applications more productively. Let me try to answer this question with an example. Imagine that we are required to build a web page that displays a list of webcast recordings. Users with administration permissions are allowed to upload new webcast recordings, and update (and delete) existing items in the list. Other users can only view existing items in the list. Further, we also have a UI requirement to customize the list rendering such that in addition to the name of the webcast and an icon (that allows users to initiate streaming), the list also allows users to download the associated presentation slide deck as well as view the description of the webcast.

One approach for implementing the above requirement would be to define a webcast content type that has fields that correspond to the columns described above. We can then associate the webcast content type with a SharePoint list. Using CAML (Collaborative Application Markup Language) we can customize the UI of the list to meet our requirements.

Note that we could have just created a list using site columns instead of defining a content type. The benefit of using a content type is that it is a reusable type that can be associated to other list instances. We can also create new content types that derive from webcast content type. SharePoint uses ASP.NET forms to allow users to insert and update list items. All data associated with the list is automatically stored in the content database. The Figure below depicts a custom webcast list.

 

clip_image001

While it may not seem difficult to add a list control on an ASP.NET page and hook-up some ADO.NET code to persist the data in the database, it should be noted that we have not written a single line of code thus far. We have relied on SharePoint list handling and content database to implement the list.

But imagine if we are required to extend the above functionality. For example, it is required that each item of the list be secured individually. Users also want to subscribe to any changes to the list (new recordings) made to the list either via RSS or email. From a QA standpoint, a content management process needs to be enforced when a new webcast recording is uploaded requiring versioning, check-in/check-out, and approval workflow. Content management requirement invariably necessitates the ability to maintain an audit trail of changes as well as the ability to undelete an item that was inadvertently deleted. Last but not the least, a search function on the site should include the information about the webcasts.

Now with the need for additional features, the custom ASP.NET solution is not easy. Fortunately, all of the above functions are provided by SharePoint List by default. We can even extend the SharePoint List behavior using event handlers.

All of the information stored inside a list is accessible, not only via the SharePoint UI, also via the Object Model (Class Library as well as Web Service based OM). This means that processes outside the host process can access the list information – a key to building transparent applications that are reusable.

As is well known by now, WSS 3.0 (and consequently MOSS) supports the ability to plug-in a custom role provider. So instead of being limited to the default AD based role provider, a custom store such as SQL database, can be used as the authorization store. As you can imagine, this gives you a lot flexibility in terms of storing and managing users.

Let us examine custom role provider in detail. But before we proceed, let us get a few core definitions out of the way. WSS defines the following core set of security related objects:

o SPUser and SPGroupSPUser represents a user, while SPGroup as the name suggests, represents a collection of users. WSS allows you to create custom groups.

o SPBasePermission, SPRoleDefinition and SPRoleAssignment- SPBasePermission is right to perform a certain action within WSS. For example the right to insert list items. SPRoleDefinition is a collection of rights. For example, Contributor, Designer are built-in roles. Finally, SPRoleAssignment defines the assignment of roles for SPUser or SPGroup

Irrespective of whether you use a custom authorization provider or not, WSS creates instances of above mentioned set of objects to manage security within WSS. For example, SPRoleDefinition object that we discussed above, relies on SPUser and SPGroup for role-to-permission mapping.

So how does custom provider help? Using a custom provider you can create roles and include them as part of SPGroup (much like you would include instances of SPUser). This is an important benefit as you grant access based on custom authorization rules, stored outside of WSS.

There is another important benefit enabled by custom role providers – Ability to manage authorization rules across site collections (SPSiteCollection). A SPSiteCollection is a security boundary within WSS. This means that SPUsers and SPGroups defined within one SPSiteCollection are not available to other SPSiteCollection instances.

A custom role, however, can span multiples site collections since the authorization provider is defined at the Web Application level (SPWebApplication). This way you can manage authorization across sites from one central administrative interface. On the contrary, if you only had to rely on SPGroups for authorization, you would have to create a separate SPGroup for each SPSiteCollection and develop a scheme to keep them synchronized.

I should add that AD groups can be used to achieve the same behavior as described above (I.e. ability to span SPSiteCollections). However, the way AD is administered in most organizations, WSS Site owners do not have the permissions to create and modify AD groups.

Constructing WF

July 9, 2007

 

This post is motivated by “Deconstructing WF” chapter from the book Essential Windows Workflow Foundation. Essential WF  is one of my favorite technical books to come out in the last few years. I have also borrowed Don Box’s famous “you have just engineered the Component Object Model” approach to bring out the concepts behind the Windows Workflow Foundation.  

Get the complete sample code for this post here

Why Workflow

Managed code has helped make executables more transparent.  Unlike the unmanaged code, rich metadata associated with the managed source code is accessible at runtime via reflection.

However, it will be beneficial to move beyond the ability to access the metadata at runtime. For instance, it would be useful to know the state of an executable  at a  point time and if needed,  the ability to dynamically modify the structure of the program. It would also be very helpful if the runtime provided support for programs that tend to run for long periods of time.  For example,  runtime should be able to move a long-running program (that is paused waiting for an external stimulus) from the memory to a storage medium and then restart it at a later time.   

One approach for  achieving the aforementioned level of transparency is to add a workflow abstraction  layer over the .NET framework (alternatively think of it as being a meta-runtime over the .NET runtime ). Developers would use the abstraction layer to build workflow style programs, instead of  of building programs directly on top of the .NET framework. Building programs using the abstraction layer has the following benefits: (i)   Higher level constructs, in the form of pre-fabricated chunks of code referred to as workflow steps, that can be stitched together to model a workflow. This is in contrast to low-level IL instructions that make up a program written directly against the .NET runtime.  Since the meta-runtime is now responsible for executing the program, it has deep insight into the state of the program execution as well the ability to dynamically modify the structure of the program (ii) Meta-runtime can provide plumbing services such as scheduling, persistence, etc.

Let us a look at how we can implement such a generic workflow framework.

Generic Workflow Framework

Serialization Format

Consider the logic depicted in the adjacent diagram. This rather trivial program logic consists of two steps:  DoWork1 and DoWork2, executed in a sequence.   

The first thing we need to do is to represent this logic in a format that can be read by our generic workflow framework. An XML based format would make sense. However, it turns out that in XAML, we already have a XML based object serialization format. So rather than inventing a new format, let us just use XAML as the serialization format. This way we can reuse the XAML parsing capabilities built into the framework. 

For now our framework is quite rudimentary. There is no tooling available that would allow users to graphically define the workflow and automatically generate the XAML code. Users will need to manually develop the XAML. By they way, let us use .XOML file extension for the XAML files we create  for storing our workflow programs. This will make it easy for us to differentiate workflow XAML files  with other XAML files. 

Here is what an XOML representation the above program would look like

    <y:Workflow
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
            xmlns:y="clr-namespace:ConstructingWF;assembly=ConstructingWF"    >
        <y:Workflow.Steps>
            <y:WorkflowStep x:Name="DoWork1" SimulatedTimeOut="11"></y:WorkflowStep>
            <y:WorkflowStep x:Name="DoWork2" SimulatedTimeOut="01"></y:WorkflowStep>
        </y:Workflow.Steps>
    </y:Workflow>

The root node represents the workflow program and maps to an instance of the Workflow class. It contains zero or more IWorkflowStepBase class instances. IWorkflowStepBase represents a prefabricated block of logic or workflow step, mentioned earlier. It is expected that developers will build custom classes that implement IWorkflowStepBase to help build workflow programs for their specific domains. For instance, workflow steps can be created to model banking industry operations such as check credit, update account, etc.

The only requirement for developing a workflow step is to implement the IWorkflowStepBase interface. This interface includes one method called Execute. The following class diagram depicts the relationship between the core classes.

 

In our XOML example above, workflow class consists of two steps executing in sequence (DoWork1 and DoWork2). Each step is of type WorkflowStep.  To keep things simple, we will simulate the work done inside each step with a sleep operation. The first step is simulated with a sleep of 11 seconds while the second step is simulated with a 1 second sleep duration.

 

Runtime Infrastructure

Next we need to implement a runtime infrastructure that is responsible for executing the workflow program. As described in the sections below, the runtime infrastructure needs to be able to provide services such as parsing the XOML file, scheduling the workflow steps and the ability to hydrate and dehydrate workflow instances.  

Parser

 We can parse the XOML (passed in as an argument ) using the XAML parser class that is part of .NET 3.0.  We can then load the parser output, an instance of the Workflow class, and add the contained WorkflowStep instances to a queue. As we will see shortly, scheduler will use this queue to execute steps inside the workflow.  

XmlTextReader reader = new XmlTextReader(args[1]);
Workflow wf = (Workflow)XamlReader.Load(reader);
foreach (IWorkflowStepBase step in wf.Steps)
{
     m_PendingSteps.Enqueue(step);
}

Note – We could replace the XAML based serialization format with another Domain Specific Language (DSL) format as long as the respective parser output can ultimately be  mapped to an instance of the Workflow class.

Scheduler

Scheduler dequeues the elements of the queue, populated in the previous step, and executes them as shown below. As noted earlier, to keep things simple our workflow framework only supports sequential workflow with a single branch.  This means that all that the scheduler needs to do is execute each IWorkflowStepBase instance in a sequence. Since no parallel branches are allowed, only one workflow step can be active at any give time.  To execute each workflow step, the scheduler invokes the IWorkflowStepBase.Execute method of the respective WorkflowStep instance. Even though each workflow instance can have only a single workflow  step executing at any given time, there can be more than one workflow instances executing concurrently. For scalability reasons  it important that scheduler thread not be blocked waiting for IWorkflowStepBase.Execute to complete. The obvious solution is to enable asynchronous behavior on the Execute method. This way a long running  WorkflowStep (waiting for days for an external stimulus to arrive, for instance) can return control back to the scheduler. The Execute method of our WorkflowStep class is a contrite example of how to take advantage of the asynchronous behavior. If the simulated timeout is greater than 10 seconds, it notifies the scheduler that it is in an idle state. WorkflowContext.WorkflowStepIdled method is responsible for sending the idled notification to the scheduler.

 

STEP_EXECUTION_STATUS IWorkflowStepBase.Execute(WorkflowContext context)
{

    System.Timers.Timer timer = new System.Timers.Timer();
    TimeSpan timeoutDuration = new TimeSpan(0,0,Int32.Parse(m_SimulatedTimeOut));
    timer.Interval = (double)timeoutDuration.TotalMilliseconds;

    if (timer.Interval > 10000)
    {
        WorkflowContext.WorkflowStepIdled();
        timer.AutoReset = false;
        timer.Elapsed += new System.Timers.ElapsedEventHandler
                    (delegate(object sender, ElapsedEventArgs e) { WorkflowContext.WorkflowStepComplete(); });
        timer.Start();
        return STEP_EXECUTION_STATUS.STEP_EXECUTING;
    }
    else
    {
         System.Threading.Thread.Sleep(Int32.Parse(timer.Interval.ToString()));
         WorkflowContext.WorkflowStepComplete();
         return STEP_EXECUTION_STATUS.STEP_COMPLETE;

    }
}

The scheduler in turn uses  the WaitHandle.WaitAny construct  to handle multiple aysnchronous notifications received from the executing workflow steps.

while (m_PendingSteps.Count > 0 )
{
    // Schedule the workflow step for execution
    ((IWorkflowStepBase)(m_PendingSteps.Dequeue())).Execute(new WorkflowContext());

    // Wait for notifications from workflow step(s)
    state = WaitHandle.WaitAny(WorkflowContext.g_waitHandles);

}

Persistence

Once a workflow step has been idled,  the parent workflow  can be deemed idle as well (remember that there can only be one executing workflow step inside a workflow). To conserve resources, it would be nice if we can remove the idled workflow instance from memory and hydrate it to a storage medium. Hydrated workflow instances can be brought back to life at an appropriate time in the future ( i.e. arrival of an external stimulus).  We need to add the hydrate/dehydrate capability to our workflow runtime. One approach to implementing this capability is to add a PeristenceService class (shown below). The PeristenceService class has two primary methods SaveWorkflowInstanceState and LoadWorkflowInstanceState.  SaveWorkflowInstanceState, as the name suggests,  persists the state of an in-memory Workflow instance to file. LoadWorkflowInstanceState method resurrects a Workflow class instance based on a previously persisted state. Persisting a Workflow instance requires that the workflow steps that constitute a workflow, be persisted as well. We can use the notion of serialization surrogates to persist workflow steps.  We can create a custom serialization surrogate for the IWorkflowStepBase  (called CustomStepSurrogate. Please see the code below). Next, we need to create an instance of the SurrogateSelector class and register it with PersistenceService. Once the surrogate is registered, all workflow steps that implement IWorkflowStepBase  will be inherit the persistence behavior. Refer to methods GetObjectData and SetObjectData of the CustomSurrogate for additional details.

    public class PersistenceService
    {
       string FILE_NAME = @".\workflowstate.bin";

        public  PersistenceService ()
        {
            m_BinaryFormatter = new BinaryFormatter();
            m_SurrogateSelector = new SurrogateSelector();
            m_CustomStepSurrogate = new CustomStepSurrogate();
            m_SurrogateSelector.AddSurrogate(typeof(Queue<IWorkflowStepBase>), new StreamingContext(StreamingContextStates.All), m_CustomStepSurrogate);
            m_BinaryFormatter.SurrogateSelector = m_SurrogateSelector;

        }

        public void SaveWorkflowInstanceState(Queue<IWorkflowStepBase> PendingSteps)
        {
            m_BinaryFormatter.Serialize(m_StreamReader.BaseStream, PendingSteps);
        }
        public Queue<IWorkflowStepBase> LoadWorkflowInstanceState()
        {
            Queue<IWorkflowStepBase> PendingSteps = (Queue<IWorkflowStepBase>)m_BinaryFormatter.Deserialize(m_StreamReader.BaseStream);
            return PendingSteps;

        }
         ~PersistenceService()
        {
               m_StreamReader.Close();

        }

    }
internal sealed class CustomStepSurrogate : ISerializationSurrogate
{

// Methods
public void GetObjectData(object obj, SerializationInfo info, StreamingContext context)
{
    foreach (WorkflowStep step in (Queue<IWorkflowStepBase>)obj)
    {

        info.AddValue("SimulatedTimeOut", step.SimulatedTimeOut);
    }
    info.SetType(typeof(Queue<IWorkflowStepBase>));

}

public object SetObjectData(object obj, SerializationInfo info, StreamingContext context, ISurrogateSelector selector)
{
    Queue<IWorkflowStepBase> queue = new Queue<IWorkflowStepBase>();

    for (int count = 0; count < info.MemberCount ; count++)
    {
        WorkflowStep step = new WorkflowStep();
        step.SimulatedTimeOut = info.GetString("SimulatedTimeOut");
        queue.Enqueue(step);
    }

    return queue;
}

 

Note – Rather than bolting the persistence related code directly into our workflow runtime, we chose to abtract  it inside a seperate service (PersistenceService). This has two benefits: (i) Workflow runtime remains light weight (ii) It is possible for replace PersistenceService, that is provided with our rutime, with a custom service implementation.

 

In summary, we first defined a XAML serialization format that can be used  to model a workflow program. We established that a workflow program is comprised of a sequence of pre-fabricated chunks of logic, referred to as workflow steps. We then defined two custom classes Workflow and  IWorkflowStepBase that map to notion of a workflow program and workflow step respectively. Next we defined a workflow runtime responsible for execution of the workflow program. It provides services such as scheduling workflow steps in the appropriate order.  Finally, we added the ability to hydrate-dehydrate workflow instances using the PersistenceService.

In short, we have just engineered the Windows Workflow Foundation!

 

 Download the complete sample code for this post.