Thursday, January 14, 2010

DCM and Transaction Rollbacks

Over the past week, I performed a large update on a 16 million row table within a test database. This update did nothing more than flipped a bit on the table from 0 to 1 but due to its size was an intensive process. Being that I was testing, I had this wrapped in a transaction and rolled it back. Thinking nothing was affected I went on with my work for the day.

In the morning I came in to work to find what was generally a 100mb differential backup file exceeding 14gb. This caught me completely off guard since i know i did not make any significant data changes, so I began to really dig. I knew the only thing that i had done that could have affected 14gb of my 33gb test database was a massive operation on its largest table. "But I rolled that transaction back, nothing changed!" I thought to myself.

As you may or may not know, there are Differential Changed Map (DCM) pages that contain bits corresponding to each extent in the database and whether or not they have changed since the last full backup. When data is inserted or updated on a page, the DCM bit for the extent that the page exists on is flipped from 0 to 1 and only reverted to 0 upon a full backup.

Knowing this, I decided to try a few tests.

I created a database with a single table, and populated that table with some data by using the following script

CREATE DATABASE DiffBackupTest
GO
USE DiffBackupTest
GO

CREATE TABLE dbo.TabA (ID INT IDENTITY(1,1),
ColA UNIQUEIDENTIFIER,
Row ROWVERSION)
GO

INSERT dbo.TabA (ColA) VALUES (NEWID())
GO 100000

Now I have a database with some data in it, so to reset the DCM, I preform a full backup.

BACKUP DATABASE [DiffBackupTest]
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Backup\DiffBackupTest.bak'

Processed 648 pages for database 'DiffBackupTest', file 'DiffBackupTest' on file 10.
Processed 2 pages for database 'DiffBackupTest', file 'DiffBackupTest_log' on file 10.
BACKUP DATABASE successfully processed 650 pages in 0.392 seconds (12.939 MB/sec).

This is followed by a differential backup.

BACKUP DATABASE [DiffBackupTest]
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Backup\DiffBackupTest.bak' WITH DIFFERENTIAL

Processed 56 pages for database 'DiffBackupTest', file 'DiffBackupTest' on file 11.
Processed 1 pages for database 'DiffBackupTest', file 'DiffBackupTest_log' on file 11.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 49 pages in 0.026 seconds (14.704 MB/sec).

Note that 56 pages were backed up on the differential backups.

Now since we know that 56 pages, or 7 extents are backed up in this differential backup I am preforming an update within a transaction that I will then roll back.

SET NOCOUNT ON
SELECT TOP 10 ID,row FROM dbo.TabA ORDER BY ID DESC
BEGIN TRAN
UPDATE dbo.TabA
SET ColA = NEWID()
ROLLBACK TRAN
SELECT TOP 10 ID,row FROM dbo.TabA ORDER BY ID DESC

After the rollback, the data still matches what was inserted before the backups since we see the rowversion has not changed. At this time, we will preform another differential backup and look at the number of pages backed up.

BACKUP DATABASE [DiffBackupTest]
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Backup\DiffBackupTest.bak' WITH DIFFERENTIAL

This time we have 552 pages, or 69 extents, backed up even though no row changes were committed.

This shows that even if changes are rolled back, SQL Server does not rollback the change to the DCM pages. This caught me off guard on our test database, and has made me more cautious when doing large data changes in databases with differential backups scheduled. From here on out, I will be sure to preform full backups after any significant data changes, I sure have learned my lesson.

Tuesday, January 12, 2010

Automated Index Maintenance

Click here to download my code samples from the Jan. 12 OPass meeting.

Far too often the IT department has a server administrator, a software developer, or even a member of management playing the role of, what Paul Randal would call, the accidental DBA. This role often has little to no database administration experience. They might know how to create a backup maintenance plan through a wizard, create logins and databases to allow the rest of the team to function. While in this role, one of the most frequently forgotten aspects of database administration is that of proper indexing and maintenance.

How can an administrator properly maintain existing indexes without existing knowledge of the indexing strategies in place and application usage? Using system views and DMVs, one can eaisly inventory current indexes and implement a system to intellegently maintain them.

Why not use maintenance plans?
SQL Server maintenance plans can be created to manage indexes but it has many limitations. The largest limitation is the lack of logging. When using a maintenance plan, there is no way to monitor the fragmentation before maintenance, and the time each index took to process. Another limitation is found when you wish to reorganize or rebuild only those indexes that excede a fragmentation threshold. These maintenance plans can be quick to set up but lack critical settings when dealing with high transaction count database systems.

How to automate the maintenance of indexes
In the system described here, we will set up automated index maintenance on the database and all of its indexes. Instead of running the query ALTER INDEX IDX_TableA_Index1 ON dbo.TableA REORGANIZE for each index in the database we will perform this based on data in a table. We create tables by running the following scripts.

1_dbo.IndexLog.sql
1_dbo.IndexMaintenance.sql
1_dbo.IndexStatus.sql

After creating the tables, Run the simple data population script located in 3_PopulateIndexStatus.sql.

At this point, we have three tables set up titled dbo.IndexLog which holds logging information about the index rebuild and reorganization procedures, dbo.IndexMaintenance which contains a list of all indexes to be maintained and their options, and dbo.IndexStatus which is used as a lookup used by the table dbo.InsertLog.

The core of the data used in the process lies within the dbo.IndexMatintnace table. This table contains columns that will be used by future jobs. In the table below you will find the purpose of each column.

SchemaName - The schema the table is contained in
TableName - The table that index belongs to
IndexName - The index to be maintained
OptionStatisticsNorecompute - Sets the STATISTICS_NORECOMPUTE flag to ON during an index rebuild
OptionAllowRowLocks - Sets the ALLOW_ROW_LOCKS flag to ON during an index rebuild
OptionAllowPageLocks - Sets the ALLOW_PAGE_LOCKS flag to ON during an index rebuild
OptionSortInTempdbSets the SORT_IN_TEMPDB flag to ON during an index rebuild
OptionOnlineRebuild - Sets the ONLINE flag to ON during an index rebuild
KeepMaintained - Determines whether or not the automated process should include this index.
Fill_Factor - Sets the FILLFACTOR flag to the specified value during an index rebuild

Next step is to populate the dbo.IndexMatintnace table with all of the indexes to be maintained. This is a simple task by use of the system object views sys.indexes, sys.columns, and sys.tables, which are all, provided by SQL Server. The query located in the script file 3_PopulateIndexMaintenance.sql will insert all data required by the automated index maintenance process into this table. Please note that the options for Statistics Norecompute, Allow Row Locks, Allow Page Locks, and Sort in TempDB are all defaulted whereas the option for Online Index rebuild is determined by the presence of LOB data on the table the index is placed on.

Now, with the dbo.IndexMaintnance and the dbo.IndexStatus tables both populated, it is time to create the jobs to rebuild and reorganize all indexes within the database. To do this, we will need to add stored procedures located in 2_dbo.job_RebuildAllIndexes.sql and 2_dbo.job_ReorganizeIndexes.

As you can see, the stored procedure dbo.job_RebuildAllIndexes has parameters that allow you to run the proc in such a way to perform only online rebuilds, offline rebuilds, or both. This allows you to schedule jobs to do online rebuilds during times not included in a maintenance window. The proc dbo.job_ReorganizeIndexes will perform a reorganize on all indexes that are marked to be kept maintained and meet the thresholds defined in the proc parameters.

At this point you have created and populated the tables dbo.IndexLog, dbo.IndexMaintenance, and dbo.IndexStatus as well as the user stored procedures dbo.job_ReorganizeIndexes and dbo.job_RebuildAllIndexes and have been created. At this point all we have left to do is create jobs to call these stored procedures on a schedule.

At this time everything is setup and ready to run. By invoking this reorganize job, it will run a reorganize on all table based on the fragmentation and pagecount threshold.

Try this out on some of your development systems and let me know your thoughts. This is still very much a work in progress so feel free to share your experiences with the rest of the community.