After a couple of weeks going back and forth I decided it was the right time to move to my own site. The new site is http://whitneyweaver.com and will be the space to keep up with me. I will mirror the new blog back to the main Magenic site so if you are a general subscriber to it you will still see my content.
One of the more important (and contentious) structures for database applications are indexes. When used well they can aid an application greatly, but when used poorly they can cause greater overhead than if no index existed at all.
There are two scenarios I will be discussing today that make use of the Dynamic Management View (DMV from here) sys.dm_db_index_usage_stats. This DMV records counts of index operations by type. It is important to note that the counters of this DMV are emptied with each restart of the SQL Server service.
The first scenario is a good introduction to this DMV if you are unfamiliar with it. We simply want to find our most active indexes. We can use the following query to return this detail:
--Get most used indexes Insert/Update/Delete and Selects
SELECT SCHEMA_NAME(o.[schema_id]) AS SchemaName
,o.name AS ObjectName
,i.name AS IndexName
,i.type_desc AS IndexType
,(COALESCE(us.user_seeks, 0) +
COALESCE(us.user_scans, 0) +
COALESCE(us.user_lookups, 0) +
COALESCE(us.user_updates, 0)
) AS Activity
FROM sys.objects AS o
INNER JOIN sys.indexes AS i
ON i.[object_id] = o.[object_id]
LEFT JOIN sys.dm_db_index_usage_stats AS us
ON us.[object_id] = o.[object_id]
AND us.index_id = i.index_id
WHERE o.is_ms_shipped = 0
ORDER BY Activity DESC;
In sys.dm_db_index_usage_stats we get counters for each seek, scan, and lookup by a user query. Each “update” query also gets a counter. This can be misleading at first as you might think the name is exact to its usage. This isn’t the case. In actuality, INSERT/UPDATE/DELETE activities all get recorded under this column. So in the above query we are able to find all activity against an index and rank them accordingly. This can then be factored in to your overall maintenance strategy.
Next, we will look at a slightly different type of analysis. One of the hidden costs of indexing is maintenance. I’ve seen this blown off during modeling/development and assumed as a zero cost item. Unfortunately, everything in life has a cost. With data (and data access) growing, it’s vitally important that we don’t waste time on operations that add no value. To that point, I will use the same DMV but this time finding indexes with a higher maintenance cost than usage.
--Indexes with more Insert/Update/Delete operations than Selects
;WITH IndexAnalysis AS
(
SELECT SCHEMA_NAME(o.[schema_id]) AS SchemaName
,o.name AS ObjectName
,i.name AS IndexName
,i.type_desc AS IndexType
,(COALESCE(us.user_seeks, 0) +
COALESCE(us.user_scans, 0) +
COALESCE(us.user_lookups, 0)
) AS QueryActivity
,COALESCE(us.user_updates, 0) AS MaintenanceActivity
FROM sys.objects AS o
INNER JOIN sys.indexes AS i
ON i.[object_id] = o.[object_id]
LEFT JOIN sys.dm_db_index_usage_stats AS us
ON us.[object_id] = o.[object_id]
AND us.index_id = i.index_id
WHERE o.is_ms_shipped = 0
)
SELECT *
FROM IndexAnalysis
WHERE MaintenanceActivity > QueryActivity
ORDER BY MaintenanceActivity DESC;
By combining the query activities and then comparing against maintenance cost we can clearly state the “value” of an index. From here it is simply a matter of pruning the indexes that have a greater overhead. For large or older implementations this can usually net quite a boost in resource time as maintenance activities decrease.
I’m going to continue with the this general topic in following posts as there are many other DMVs that can help us make the most of our indexes.
I am happy to report that I’m writing this on a snappy new install of Windows 7. I installed on a Dell Latitude D620, with a dual core and 3GB of RAM. This isn’t exactly a cutting edge device so I hope my experiences will keep you from rolling your eyes and thinking “yeah, right…”.
So here is a quick list of thoughts from this install and other conversations during the Windows 7 beta process:
1. Use Live Mesh. I can’t express how much effort this takes off your process. When your data is replicated elsewhere you really save some time. Of course, this does have a 5GB limit, but that leads me to my second point.
2. Off-load the non day to day stuff to an external drive. Unless you are an incredible pack rat you should be able to pick up a drive under $125 in a range of sizes. In the metro Atlanta area you can now get 160 gigs for under $70.
3. Install from USB. I was able to go from boot to completed setup in 14 minutes today. No joke. Again, this was not some NEC sponsored craziness…this was a $12 no name 8 gig flash drive I got last night as Sam’s. Here’s a link for the setup of the drive, I don’t wish to rehash.
4. Do a clean install. For the sake of all of us, format the drive first.
5. Investigate the power options. Especially if you are coming from XP and/or installing on a laptop, have a look at the settings here. You can set a number of options of how your device will behave. My personal favorite is “Choose what closing the lid does”.
6. Don’t skip the desktop. Have a look at the preloaded desktop options, someone in the development team obviously spent a little time here. And boy did they have an odd sense of humor.
Those are my quick list. I would be curious to hear other people’s experiences. As I come across others I will post again. Until then, happy installing!
One of the attendees of my session on metadata last Saturday asked about how to get at this information. We were already time constrained at 50 minutes per session so I was not able to fit that query in with the remaining items I wanted to show.
The following query starts with the catalog view sys.partitions. This might throw people as they would assume that partitioning must be implemented to make use of this view. One of the architectural changes made in SQL Server 2005 is that all tables are technically on a partition (the value being 1 if non-partitioned). We also make use of sys.destination_data_spaces, sys.allocation_units, and a few other general catalog views.
SELECT sc.name AS SchemaName
,t.name AS ObjectName
,i.name AS IndexName
,i.type_desc AS IndexType
,f.name AS FileGroupName
,[rows] AS NumberOfRows
,au.total_pages AS TotalPages
FROM sys.partitions AS p
INNER JOIN sys.tables AS t ON t.[object_id] = p.[object_id]
INNER JOIN sys.schemas AS sc ON sc.[schema_id] = t.[schema_id]
LEFT JOIN sys.indexes AS i ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
LEFT JOIN sys.destination_data_spaces AS dds ON
dds.partition_scheme_id = i.data_space_id
AND dds.destination_id = p.partition_number
LEFT JOIN sys.filegroups AS f ON f.data_space_id = i.data_space_id
LEFT JOIN (
SELECT container_id, SUM(total_pages) AS total_pages
FROM sys.allocation_units
GROUP BY container_id
) AS au ON au.container_id = p.partition_id
WHERE i.index_id > 0
ORDER BY SchemaName, ObjectName, IndexName;
This query also returns the filegroup the particular index uses. If you aren’t currently utilizing multiple filegroups for your database I would strongly recommend investigating that option.
Let me know if you have questions (or requests). Enjoy…
I would like to thank all the great people that came out today, it was an awesome day. I really enjoyed meeting a lot of new people and hopefully spreading a little excitement about partitioning and meta data. A special thanks to all the folks that attended my sessions.
The slide decks and code samples for both of my sessions can be found below. If you have any questions please contact me directly at my Magenic address or through the Email section of this blog.
This came up at the SQLSaturday speaker dinner tonight and I had missed Bob Ward’s post about it. Definitely good news for all those customers on Standard edition.
The cumulative update for SQL Server 2005 should be arriving in June for SP3. SQL Server 2008 will be getting CU2 for SP1 in May.
Always cool to see the SQL team get it right.
Several outlets reporting that Windows 7 RC will be available on April 30th to MSDN/Technet subscribers and May 5th for general public. I’m just happy this is dropping before I go on vacation. :)
My word, I’m such as geek…back to SQLSaturday prep.
I was helping a colleague out with some data migration today and got started discussing methods of finding data that would cause problems. In today’s case he needed to verify the phone numbers contained in a database table didn’t contain a couple values that would break later application formatting. Here is a simple example of the data (note the row constructor if trying this in pre SQL 2008 versions):
CREATE TABLE #PhoneNumbers
(
PhoneNumber nvarchar(30)
);
INSERT #PhoneNumbers (PhoneNumber)
VALUES ('11122223333')
,('111.222.3333')
,('111-222-3333')
,('111 222 3333')
,('111/222/3333');
Now we could simply apply this query to test for any character data in our phone number:
SELECT *
FROM #PhoneNumbers
WHERE ISNUMERIC(PhoneNumber) = 1;
But this doesn’t really get at what we want. In this case, we only care about certain characters. To our aid comes the trusty PATINDEX() function, allowing us to write this query:
SELECT *
FROM #PhoneNumbers
WHERE PATINDEX('%[. /]%', PhoneNumber) > 0;
This allows us to ignore phone numbers with “-“ as they don’t offend our application’s sensibilities. The same query can be used later with a REPLACE() as the basis for an update to remove the invalid characters. Good times all around.
I’ve been doing quite a bit of modeling lately and needed to go back and verify that I had the various default constraints as I wanted them on my tables. In this case, I was looking for tables with constraint value using the getdate() system function.
Here is the T-SQL to get at the default constraint values:
SELECT SCHEMA_NAME(t.[schema_id]) AS SchemaName
,t.name AS TableName
,c.name AS ColumnName
,dc.[definition]
FROM sys.tables AS t
INNER JOIN sys.columns AS c
ON c.[object_id] = t.[object_id]
INNER JOIN sys.default_constraints AS dc
ON dc.[parent_object_id] = c.[object_id]
AND dc.parent_column_id = c.column_id
WHERE t.is_ms_shipped = 0
AND dc.definition = '(GETDATE())'
ORDER BY SchemaName
,TableName
,ColumnName;
As an aside, I’ve been using the super handy new datetimeoffset data type everywhere I had normally used datetime. The default constraint you would want to implement with datetimeoffset is sysdatetimeoffset() instead of getdate().
I really wish this had been implemented back in the 2005 product, I would have written a lot less code to do time zone to time zone conversion. Oh well, progress is good whenever it comes.
Enjoy!
I will be speaking at SQLSaturday on the 25th at the Microsoft office in Alpharetta. I’m doing two sessions, A Lap Around SQL Server 2008 Partitioning and Solving Real World Problems With Meta Data. The partitioning talk I’ve done a couple times in the past but the meta data one is new. If you can’t get in next Saturday (there is a waiting list it seems, wowza) I will have slides and code samples posted soon after. Hope to see you there!
The Data Platform Insider has the good news that SP1 is available. Go get your download on…
I helped a fellow Magenicon with this detail today and thought it might be helpful to others. Note, this will only work in instances that are SQL 2005 or greater.
SELECT SCHEMA_NAME(o.[schema_id]) AS SchemaName
,o.name AS ObjectName
,REPLACE(o.type_desc, '_', ' ') AS ObjectType
,ap.parameter_id AS ParameterNumber
,ap.name AS ParameterName
,CASE
WHEN t.name IN ( 'char', 'nchar', 'varchar'
, 'nvarchar', 'binary', 'image'
, 'varbinary' ) THEN t.name
+ ' (' + CONVERT(varchar(10), ap.max_length) + ')'
WHEN t.name IN ( 'bigint', 'bit', 'int', 'money'
, 'smallint', 'smallmoney', 'tinyint'
, 'float', 'real', 'date', 'datetime2'
, 'datetime', 'smalldatetime', 'time'
, 'cursor', 'hierarchyid', 'sql_variant'
, 'table', 'timestamp', 'uniqueidentifier'
, 'xml', 'sysname', 'text', 'ntext' ) THEN t.name
WHEN t.name = 'datetimeoffset' THEN t.name
+ ' (' + CONVERT(varchar(10), ap.scale) + ')'
WHEN t.name IN ('decimal', 'numeric') THEN t.name
+ ' (' + CONVERT(varchar(10), ap.[precision])
+ ', ' + CONVERT(varchar(10), ap.scale) + ')'
ELSE 'Unknown'
END AS DataType
,CASE
WHEN ap.is_output = 1 THEN 'Yes'
ELSE 'No'
END AS IsOutputParameter
,COALESCE(ap.default_value, 'N/A') AS DefaultValue
FROM sys.all_parameters AS ap
INNER JOIN sys.objects AS o ON o.[object_id] = ap.[object_id]
INNER JOIN sys.types AS t ON t.system_type_id = ap.system_type_id
ORDER BY SchemaName
,ObjectName
,ParameterNumber;
Not much I can add to James’ post other than to say he constantly makes me smile at work. Check it out and leave him a comment if we missed one…
Bart Duncan has a great post today about the new datetimeoffset feature in SQL Server 2008. Add it to the growing list of reasons to migrate…
Over the weekend I had a terrible experience with the external hard drive I use. The drive completely crapped out leaving me no options other than banging it on my head. This drive held everything in my day to day work at Magenic, so a lot of content. And I know, the irony of the data guy being burned by not having a recent backup is truly fabulous. Fortunately, I started using Live Mesh several months ago and had all project folders from Visual Studio and SSMS pointed there. As painful as it was to lose several weeks worth of client documents and some presentation slide decks the code survives. At the end of the day that makes Live Mesh my favorite Microsoft product and I would strongly recommend it to anyone that cares about keeping code or other content.