SDS and Relational Model
April 6, 2009
I read Andrew’s column on support for the relational capability within SDS. While it is great to have “change the conn string to move to cloud” capability, I think it is useful to understand how the relational model is enabled by SDS. Additionally, in my opinion, it will be also be important to understand how the entity based storage (offered via Azure Storage) can be leveraged when designing applications for the cloud.
The architecture of the data fabric, that powers SDS, is a node based, scale-out architecture. Let us go over some of the core concepts of the data fabric :
- Storage Unit – Basic unit of storage that supports the CRUD operations. Example – database row.
- Consistency Unit – Set of storage units that can be queried and updated in a consistent manner. Example – A set of rows with the same partition keys or even an entire database instance.
- Failover Unit – Group of consistency units that are guaranteed to be available. SDS replicates the failover units to a replica set to ensure that data is always available.
Here are some things to consider when using the SDS relational model (this is based on my limited understanding, of course. I urge you to view Gopal’s excellent Under the Hood talk for additional details)
- Consistency is guaranteed inside a single node or consistency unit. This means that if your entire on-premises database instance can fit into one consistency unit, you are OK. But on the other hand, if you are going to exceed it, you will need to think about partitioning your relational model – Until the time Microsoft adds auto-partitioning functionality on top of the relational model. Note that there is no support for transactions that span across consistency units.
- Data fabric uses dynamic partitioning to improve performance i.e. it can move a consistency unit around to spread the load evenly across the cluster of nodes. A single fail-over unit is designed to host one more storage units vs. dedicating the entire fail-over unit to a single consistency unit. This is because a) it easy to re-balance the load by moving the “smaller” consistency units around b) easier/faster to recreate a failed node. When you move the your entire relational DB instance to the cloud as a single consistency unit, chances are that data fabric will need to dedicate the entire fail-over unit to it (to improve performance). This can limit some of the benefits of dynamic data partitioning.
- Bear in mind that the data fabric is based on scale-out using commodity hardware (typically 1.5 to 1.7 GHz X64 and 1.7 GB of memory – this is based on Chuck Lenzmeier’s talk at the PDC and obviously subject to change).
- Consider the additional latency cost due to the fact that writes are propagated to a quorum of a replicas.
As stated earlier, it might be worthwhile to consider the entity based storage offered by the Azure Storage, for the following reasons:
- If you need flexible schema or are building analysis focused applications – an entity based storage may be more suitable.
- Entity based storage forces you to think about partitioning from ground up. This allows for almost linear scaling as additional nodes are added to the mix.
- Single index (Partition Key + Row Key )may seem limiting but there are number of ways of get around this including dynamically adjusting the partition key, storing multiple types of entities inside a single table so we can partition the data using partition key + entityType.