Welcome to

Magenic Technologies Community Blog

Sign in | Join | Help

The W Blog

Whitney Weaver, Magenic Technologies
Sparse Columns in SQL Server 2008

SQL Server 2008 has introduced sparse columns to DDL activities.  So when exactly would you use this new construct?

Say you are modeling a product line for a company.  The company in question runs a sporting good store stocking shoes, hats, and equipment.  As you might imagine a product table for these types of products would normally leave you with many NULL columns as certain properties of one product would not be applicable to another.  Sparse columns look to save the day here as they are optimized for the storage of null values.

Like most things in the SQL Server world, sparse columns have their trade offs.  While storage of null values is optimized you do pay a cost for non null values.  The amount varies by data type, but it is typically in the 50% range.  BOL recommends you look to sparse columns when you would receive a 20 to 40 percent space savings from traditional storage practices.

There are a few data types that aren't invited to the sparse column party.  These are:

  • geography
  • geometry
  • image
  • ntext
  • text
  • timestamp
  • user-defined data types

No real surprises here as several are set to be deprecated in future releases.  The others mentioned might make a good follow on blog post.

So now to the interesting part, at least for me...what does the code look like?  Here is an example product table with sparse columns for size, color, and shape:

IF OBJECT_ID('dbo.ProductSparse, 'U') IS NOT NULL
DROP TABLE dbo.ProductSparse;

CREATE TABLE dbo.ProductSparse
(
ProductID        INT             IDENTITY(1,1)	CONSTRAINT PK_ProductSparse PRIMARY KEY CLUSTERED
,[Description]   VARCHAR(100)    NOT NULL
,Size            SMALLINT        SPARSE NULL
,Color           VARCHAR(50)     SPARSE NULL
,Shape           VARCHAR(50)     SPARSE NULL
);

Not much to it, right?  Sparse columns look and behave like any other column in your table.  The catalog view sys.columns does have an added is_sparse column if you need to validate whether a column is sparse.  Object explorer also lists this information for each column in a table, before the data type.

To really contrast this lets add another table matching our ProductSparse table but without the SPARSE decoration on the columns:

CREATE TABLE dbo.ProductNotSparse
(
ProductID        INT             IDENTITY(1,1)
,[Description]   VARCHAR(100)    NOT NULL
,Size            SMALLINT        NULL
,Color           VARCHAR(50)     NULL
,Shape           VARCHAR(50)     NULL
);

Now let's add some data to the tables to see the storage implications.  For this test I'm only inserting the [Description] column and letting the others remain NULL.

INSERT dbo.ProductSparse ([Description])
SELECT 'My Product';
GO 10000

INSERT dbo.ProductNotSparse ([Description])
SELECT 'My Other Product';
GO 10000

We can check the storage results with a query against a DMV (you should expect one on this blog, right?)...

SELECT OBJECT_NAME(object_id) AS objectName
    ,in_row_data_page_count
    ,in_row_reserved_page_count
    ,in_row_used_page_count
    ,used_page_count
    ,reserved_page_count
    ,row_count
    ,totalPages
    ,dataPages
    ,usedPages
FROM sys.dm_db_partition_stats AS s
    INNER JOIN (
                SELECT container_id, SUM(total_pages) AS totalPages, SUM(data_pages) AS dataPages, SUM(used_pages) AS usedPages
                FROM sys.allocation_units
                GROUP BY container_id    
                ) AS au ON s.partition_id = au.container_id
WHERE s.object_id = OBJECT_ID('dbo.ProductSparse', 'U')
    OR s.object_id = OBJECT_ID('dbo.ProductNotSparse', 'U');

If we execute this query we will see a very minimal difference in storage, only 2 more pages for the non NULL values.  However, if we alter our GO loop to 100,000 we see a difference of roughly 28 pages. Now that's some savings! 

As you look for ways to manage your growing data, keep sparse columns in mind.  In the right situation they can be a really solid choice.

In my next post I will show how you can work with these sparse columns through other new technologies, namely filtered indexes and column sets.

-W

Posted: Tuesday, March 11, 2008 7:17 PM by WhitneyW

Comments

Whitney Weaver - The W Blog said:

After a brief blog delay we're back and continuing our example from last time around sparse columns and

# March 25, 2008 3:55 PM

The W Blog said:

We're back on the 2008 topics here at the W, talking about column sets.  Column sets extend sparse

# May 21, 2008 9:59 PM
Anonymous comments are disabled