Reporting on MOSS Data

November 25, 2008

 

Generating reports based on the data contained within SharePoint lists is a common requirement for many projects, and yet, there is no easy or direct way to realize this functionality. Fortunately, extensibility options provided by MOSS and SQL Server Reporting Services (SSRS) can help. This post briefly discusses the various approaches for generating reports on MOSS based data.

SharePoint Storage Basics

Sharepoint provides an abstraction around the underlying data storage (aka Content DB) wherein users don’t interface directly with the content database. Instead, the users work with lists and SharePoint in turn, stores the information appropriately inside the content DB.

To be able to support various lists (lists with varying content types), SharePoint utilizes the universal pattern – where a single database table is used to capture information stored in different lists. A universal table (the userdata view inside the content DB) contains generic columns such as int1, int2, varchar1, varchar2, etc. that are mapped to the fields in a list. The mapping between fields and universal table columns is maintained inside the lists view as shown below.

Here is an example of a field (Title) that is based on an existing site column

<FieldRef Name=”Title” ColName=”nvarchar1″/>

Here is an example of a field (Address) that is defined as part of a list

<Field Type=”Text” DisplayName=”Address” Required=”FALSE” MaxLength=”255″ ID=”{76958f23-d514-4d2a-bb16-373429eb6569}” SourceID=”{45581111-aa7e-4567-9a03-684a7ef051e8}” StaticName=”Address” Name=”Address” ColName=”nvarchar3″ RowOrdinal=”0″/>

The universal table implementation in SharePoint goes a step further, it allows fields to span across multiple rows, if needed e.g. if a custom list has more integer columns than the 16 that are available in a single row.

As you can imagine, the universal pattern is really convenient from a manageability perspective. An alternative design would be to provision a new table for each list or content type. Unfortunately the downside of the universal pattern is pretty obvious as well. For example, SQL Server will lock multiple rows of a list (or folders within a list) even when a single list item is being updated. In an extreme case, the entire table may be locked leading to single-threaded access.

The other obvious disadvantage is the limited ability to tune the performance, for example optimizing the index. With WSS 3.0, we now have the ability to mark a single field within a list as an “indexed” field. Indexed field can lead to significant gains in performance as long as they are used in the where clause of the query.

While it is useful to look under the covers to understand the structure content db, Microsoft does not want users to have any dependencies to the underlying tables (for good reason – the underlying storage format is subject to change without notice!). Instead, they provide a number of different object model (OM) based access schemes to safely access this data. These include

1) Search OM – Allows executing queries against the search engine

2) Query OM – Allows executing CAML queries using classes such as SPQuery amd SPSiteData QueryThis comes in two flavors – class library and web service based OM.

The White Paper in reference [1] below discusses all the OM options and how they perform under load. I recommend reading this paper in its entirety. In a nutshell, SPQuery using indexed column is the best option in terms of performance. This paper also talks about using PortalSiteMapProvider based scheme that uses caching to improve performance. I am ignoring this option for the purposes of this discussion as it does not fit the reporting needs very well.

When to use Content DB as custom application data storage

It should be clear from the previous section that the universal table is not designed to be:

High performing – Because of the limited indexing and locking options it is not designed for a high throughput OLTP style access.

Tunable – Schemes such as partitioning to split very large lists are not available. There is also no ability to tune the default SQL (use custom SPs etc) used by the system.

Aggregation – No ability to aggregate, group information from multiple lists close to the source of data (CAML based aggregation across lists takes place at the application tier which is much more expensive)

So if your application truly requires the above traits, the best option is to store data outside the content DB and use one of the several schemes (webparts, _layout, ASPX based content pages to surface your application within SharePoint.

On the other hand, if you are building an application with modest sizing requirements you can take advantage of the abstraction provided by SharePoint and thus avoid the need for any custom data access code. As an example, we built an invoicing system entirely based on the content DB. Invoices were “split” across site collections to allow the solution to scale-out (as opposed to a scale-up approach)

Generating Reports based on SharePoint Data

Whether you store application data inside the content db or not, there is almost always a requirement to generate reports based on the metadata associated with the document stored in SharePoint. Before we look into various reporting options, let us briefly look at some of the key characteristics of a reporting technology such as SQL Server Reporting Services (SSRS) SSRS can be broken up into three parts (as shown in the figure below)

1) Designer -used to create a report layout

2) Processor – binds report data to the layout

3) Connectivity – to a data source

It is easy to see that the ability to scale reports reporting generation is dependent on how fast the report data source can return the requested data. Relational databases (RDBs) with their innate ability to process set-based operations are obviously very good at this. RDBs are also very good at grouping, sorting and filtering; other common reporting requirements.

SQL Server Reporting Services (SSRS), like many other reporting tools, supports connectivity to various data sources out of the box, including SQL Server, DB2, and Oracle. Again, similar to other reporting tools, SSRS allows custom data sources e.g. Active Directory, Disconnected DataSets etc., to act as the source for report data. The only requirement is that the custom data source support a forward-only stream based access to data.

The other key aspect of reporting is the ease of use in creating a report layout. Designers such as the one included with SSRS make it easy to support report layouts such as cross-tab reports, nested reports etc.

Finally, reporting tools exhibit many advanced capabilities such a caching, slicing (ability to support multiple reports via a single “super” dataset), scheduled report generation, delivery (PDF, Excel, Word etc.) and dynamic reports.

Reporting Options

Let us now look at some different options for generating reports on the SharePoint content metadata

Built-in and custom WebParts

If you are trying to generate a simple report (aggregated task list) based on a single list , it might be sufficient to use one of built-in webparts (for example Content Query Web part – see Reference [2] for detailed instructions).

The key advantage of this approach is that security trimming takes place automatically. Another advantage is that the report is always generated on data that is current. The downside is that manual work is required to create the layout (no report designer type tool is available). Any advanced functionality such as master-detail or nesting requires additional custom coding. Finally, unless a caching scheme is used, each view of the report causes the query to be computed again imposing additional load on the system.

Entity Framework

The ADO.NET Entity Framework is part of VS .NET 2008 SP1. The ADO.NET Entity Framework provides a level of abstraction over a data source. The ADO.NET Entity Framework is based on the Entity Data Model (EDM). Basically the EDM allows users to model the database like the way they designed the database in the first place, probably with an Entity-Relational (ER) diagram. This level of abstraction is important because as more and more application information is being maintained in the database, the relational schema more frequently gets modified for performance and normalization. Those changes can affect the data access code in the application. For example, a product table may become a join across a few tables, but as a programmer I only really only care about what the application defines as a Product entity.

So why is EDM important? It is expected that future version of SSRS works natively with the EDM. Using the EDM we could model a SharePoint list and then use SSRS to generate reports.

The key advantage is that this scheme is that it allows us native, yet safe, access to the SharePoint tables. This is because EDM acts as a firewall between the underlying SharePoint content DB structure and Entity-SQL based queries. If the structure of content DB changes at some point, all we need to do is change the EDM. The Entity-SQL queries are not impacted.

Native access to data stored in content DB means zero latency as well as full fidelity access to item level security.

The downside of this approach is that EDM model needs to be manually defined. The other down side is that tuning options are still not available.

Data Extraction Service and Operational Data Store

Another reporting option is to develop a Data Extraction Service (DES) shared service similar to BDC, except this DES would export the information stored inside SharePoint out to an operation data store (ODS). DES is a generic scheme that works across various list types without the need to develop list specific custom code. It is envisioned that users will upload a BDC style XML configuration. DES will then use the configuration file to export data to an external data source on a scheduled basis. To make the transfer as efficient as possible, only differential updates will be carried over.

Security (like the BDC) will be applied at the entity level and not at the SharePoint item.

This scheme offers the most scalability and flexibility. Once the data is available inside ODS, any SSRS technique can be applied. All the tuning and optimization options provided by SQL Server are available as well. Further, the SharePoint performance is not impacted by reporting (although there will be cost associated with running the DES service for extracting the data from SharePoint into the ODS)

The downside of this approach is the latency of data and effort required to generate the DES metadata.

If this scheme seems useful, we could invest in a tool that generates the DES metadata (similar to the BDC metadata generation tools available in the market today)

Reporting Services Data Extension

SSRS supports a modular architecture designed for extensibility. One such extensibility option is the data processing extension which allows SSRS to connect to data sources and retrieve data.

We have two options here:

1) Develop a custom data extension ourselves, by wrapping the SharePoint Web Service based API. Alternatively, we could have a custom web service return a dataset. SSRS 2005 has the ability to consume a dataset directly. The latter is a bit easier because we don’t have to implement all the data extension ourselves, although the report design experience is not rich as provided by data processing extension.

2) Use a commercial data extension such as the one provided by Enesys. Enesys data extension makes it possible to use SharePoint (MOSS 2007 & WSS V3) lists data for building reports with Microsoft SQL Server 2005 Reporting Services – a custom XML query (with portions of CAML) can be directly embedded into the RDL.

The biggest upside of this approach is that it allows the richness of the SSRS designer to be leveraged.

On the downside, Enesys RS uses SharePoint Web Service OM to extract information from the content DB and is thus subjected to the inherent performance limitations of CAML.

References

[1] White Paper: Working with large lists in Office SharePoint® Server 2007, Steve Peschka, Microsoft: http://technet.microsoft.com/en-us/library/cc262813.aspx

[2] Data Processing Extensions Overview, SQL Server 2008 Books Online, Microsoft,
http://msdn2.microsoft.com/en-us/library/ms152816.aspx?s=11

[3] How to: Customize the Content Query Web Part by using Custom Properties, Microsoft,
http://msdn2.microsoft.com/en-us/library/aa981241.aspx

[4] Enesys RS Data Extension
http://www.enesyssoftware.com/Products/EnesysRSDataExtension/Features/tabid/73/language/en-US/Default.aspx

3 Responses to “Reporting on MOSS Data”

  1. Adam Cassel Says:

    Fantastic post, truly superb. Comprehensive, clear, rigorous, and thoughtful discussion in an enjoyable style – a superbly crafted effort. Been wrestling with selection of a solution approach for this problem domain and this effort of yours has greatly cleared the fog and provided invaluable context and understanding. Thank you greatly! -Adam Cassel, Senior DBA, Children’s Hospital Boston

  2. Octavia Says:

    Recognise a good thing when you see it. com offers its esteemed customers
    with the hottest trends in luxury fashion. For example,
    the Tank Francaise Series is a historical collection by itself.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: