Table Partitioning in SQL Server 2005
I'm going to spend the next few posts discussing how you can implement partitioning in SQL Server 2005 to improve the performance and maintenance time of very large tables.
In previous versions of SQL Server it was possible to partition data by physically separating tables or by creating views. Both of these approaches had flaws in performance or maintainability. Luckily, SQL Server 2005 solved this problem by implementing true partitioning where we have the ability to manage where data resides yet maintain a single object.
At a high level there are only a few items needed to implement partitioning, these are:
- A partitioning key
- A filegroup
- A partition function
- A partition scheme
Partition Key
The selection of a partition key is one of the most important decisions you will make in your implementation. This column will drive where data is divided along your partition boundaries. Typically a partition key is either date based or an ordered grouping of data.
It is crucial to determine the access pattern for the data you will be storing. You will want to ensure that your partitioning key is restricted in important queries’ WHERE clause. This will allow partition elimination to take place (which will be discussed in a later post).
Filegroup
Filegroups can be employed to help separate your data which will improve performance and maintainability. It is generally best to have the same number of filegroups as partitions. Filegroups may have one or more files, but each partition must map to a filegroup. For more granular backup control you should design your partitioned tables so that related data resides in the same filegroup. The syntax creating the filegroup and file is:
ALTER DATABASE PartitionDB ADD FILEGROUP [FG200801];
GO
ALTER DATABASE PartitionDB ADD FILE
(
NAME = N'PartitionData200801'
,FILENAME = N'D:\SQLData\PartitionData200801.ndf'
,SIZE = 5MB
,FILEGROWTH = 5MB
) TO FILEGROUP [FG200801];
GO
Partition Function
The partitioning function defines the algorithm that is used to map rows of a table or index into partitions. The scope of the partition function is limited to the database in which it is created. The maximum number of partitions allowed in a partition function is 1000. The general syntax to create a partition function is:
CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,...n ] ] )
[ ; ]
All data types are valid for input_parameter_type, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types.
The LEFT or RIGHT designation specifies which side of each boundary value the boundary_value belongs. If not specified LEFT is the default.
If we wanted to create a range function to partition our data in a month/year form we could execute the statement below.
CREATE PARTITION FUNCTION DateRangePFN (DATETIME)
AS RANGE RIGHT FOR VALUES('20080101', '20080201', '20080301', '20080401'
,'20080501', '20080601', '20080701', '20080801', '20080901', '20081001'
,'20081101', '20081201');
Upon executing this statement we would create 12 boundaries, with data partitioned in the following manner.
| Partition |
1 |
2 |
... |
11 |
12 |
| Values |
< Jan 1 2008 |
>= Jan 1 2008
< Feb 1 2008 |
|
>= Nov 1 2008
< Dec 1 2008 |
>= Dec 1 2008 |
Partition Scheme
The partition scheme associates the partitions with their physical location in a filegroup. The partition scheme is created referencing the partition function to employ. The general syntax to create a partition function is:
CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )
[ ; ]
If [ALL] is specified only one file_group_name can be specified. Partitions are assigned to filegroups, starting with partition 1, in the order they are listed in [,...n].
Going back to our prior partition function example, we could create a partition scheme mapping the partitions to multiple filegroups by executing the statement below.
CREATE PARTITION SCHEME DateRangePS
AS PARTITION DateRangePFN
TO ('FG1', 'FG2', 'FG3', 'FG4', 'FG5', 'FG6', 'FG7', 'FG8', 'FG9', 'FG10'
,'FG11', 'FG12');
Upon executing the statement we would have associated filegroups and partitions in the following manner.
| Filegroup |
FG1 |
FG2 |
... |
FG11 |
FG12 |
| Partition |
1 |
2 |
|
11 |
12 |
| Values |
< Jan 1 2008 |
>= Jan 1 2008
< Feb 1 2008 |
|
>= Nov 1 2008
< Dec 1 2008 |
>= Dec 1 2008 |
At this point we have the objects necessary to create a table and partition data. The create statement for our table will reference the partition scheme as well as the column used by the partition function. The following code creates a table using our partition scheme and function.
CREATE TABLE dbo.Invoice
(
InvoiceID INT NOT NULL
,CustomerID INT NOT NULL
,InvoiceDate DATETIME NOT NULL
,Amount MONEY NOT NULL
,StatusID TINYINT NOT NULL
,ShipDate DATETIME NOT NULL
)
ON DateRangePS(InvoiceDate);
At this point any data inserted into the Invoice table will be partitioned by the specified boundaries in placed in their associated filegroups. We have ourselves a partitioned table!
In my next post I will discuss indexing, back up, and other maintenance activities related to partitioning as well as some handy functions that have been added to T-SQL.