How to Find the Amount of Fragmentation on Your SQL Server Instance

Need to find the amount of fragmentation in all your indexes in all databases for a given instance of SQL Server 2005 or later? You can use the undocumented/unsupported sp_MSforeachdb to loop through all the databases. For more information on fragmentation see my previous blog http://blogs.msdn.com/cindygross/archive/2009/11/20/sql-server-and-fragmentation.aspx.

 

— Cindy Gross 2009

— find fragmentation on all indexes in all databases on this instances

— to find fragmentation on just one db comment out the exec master… line and the last quote

— you must use the db_id() rather than NULL as the first parameter of the DMV or it will try to do a join across all dbs

SELECT @@SERVERNAME, @@VERSION, GETDATE() as BatchStartTime

exec master.sys.sp_MSforeachdb ‘ USE [?];

 

DECLARE @starttime datetime, @endtime datetime

SELECT @starttime = GETDATE()

SELECT db_name() as CurrentDB, @starttime as DBStartTime

SELECT db_name() as DatabaseName, OBJECT_NAME (a.object_id, DB_ID()) as ObjectName,

a.index_id, b.name as IndexName,

avg_fragmentation_in_percent, page_count, index_depth, index_type_desc, alloc_unit_type_desc

— , record_count, avg_page_space_used_in_percent –(null in limited)

FROM sys.dm_db_index_physical_stats (db_id(), NULL, NULL, NULL, NULL) AS a

JOIN sys.indexes AS b

ON a.object_id = b.object_id AND a.index_id = b.index_id

–WHERE index_id > 0 — exclude heaps

ORDER BY DatabaseName asc, avg_fragmentation_in_percent desc, ObjectName asc, IndexName asc

SELECT @endtime = GETDATE()

SELECT @starttime as StartTime, @endtime as EndTime, DATEDIFF(MINUTE,@starttime,@endtime) as TotalMinutes

How and Why to Enable Instant File Initialization

See my new blog post (written with Denzil Ribeiro) about “How and Why to Enable Instant File Initialization” on our PFE blog. Keep an eye on the PFE blog for more posts from my team in the near future.

Professional SQL Server 2008 Internals and Troubleshooting

Our new book, Professional SQL Server 2008 Internals and Troubleshooting, will be shipping soon! Order now! 🙂 Christian Bolton, Justin Langford, Brent Ozar, and James Rowland-Jones have each written several chapters in this book. Steven Wort, Jonathan Kehayias and I each contributed a chapter as well. The 1st half of the book introduces you to how things work within SQL Server at a level that will make it easier to understand the rest of the book. The 2nd half of the book focuses on troubleshooting common SQL Server problems.

Download the first chapter and find out more about the book here: http://sqlservertroubleshooting.com/.

Chapters include:

  1. SQL Server Architecture
  2. Understanding Memory
  3. SQL Server Waits and Extended Events
  4. Working with Storage
  5. CPU and Query Processing
  6. Locking and Latches
  7. Knowing Tempdb
  8. Defining Your Approach to Troubleshooting
  9. Viewing Server Performance with PerfMon and the PAL Tool
  10. Tracing SQL Server with SQL Trace and Profiler
  11. Consolidating Data Collection with SQLDiag and the PerfStats Script
  12. Introducing RML Utilities for Stress Testing and Trace File Analysis
  13. Bringing It All Together with SQL Nexus
  14. Using Management Studio Reports and the Performance Dashboard
  15. Using SQL Server Management Data Warehouse
  16. Shortcuts to Efficient Data Collection and Quick Analysis

http://rcm.amazon.com/e/cm?lt1=_blank&bc1=000000&IS2=1&bg1=FFFFFF&fc1=000000&lc1=319D3A&t=cinisthou-20&o=1&p=8&l=as1&m=amazon&f=ifr&asins=0470484284

SQL Server’s Default Trace

Are you familiar with SQL Server’s default trace setting? It can be helpful with finding basic who/when type information on major events. For example, you may want to know who was creating and dropping databases on a given instance.

 

SQL Server has a couple of options that might help you find out more about when/by who the database is being created and dropped. One is Policy Based Management but you would need to configure it ahead of time. Another option is to run a profiler trace that captures information such as CREATE, ALTER, DROP DATABASE. Some of the DMVs might have the execution information if you capture it fast enough after it happens. XEvents can be used in SQL 2008 to find all sorts of information. However, the one that might be most appropriate in this case is the Default Trace.

 

1)      Make sure the default trace is enabled in your configuration options for this instance. If it is not enabled, you can enable it through the sp_configure settings.

— Check to see if the default trace is enabled (0=off, 1=on)

EXEC sp_configure ‘default trace enabled’

GO

— Enabled the default trace

EXEC sp_configure ‘default trace enabled’, 1

GO

RECONFIGURE

2)      The trace files will eventually overwrite themselves, so check for the output soon after the problem occurs (perhaps make periodic copies of the files). They will be under the log directory where SQL Server is installed. For example, for my SQL 2008 instance named WASH the output files are in C:Program FilesMicrosoft SQL ServerMSSQL10.WASHMSSQLLog. The files will be named log_xxx.trc and there will be up to 5 of them.  

3)      Find the trace which covers the time period when the database was created or dropped. You can either open it in the Profiler GUI or you can use the query below to pull out the appropriate data. Look for the create and/or drop events and see who executed them from what workstation and at what time. Some applications will send their “application name” so you may be able to tell that as well.

 

Key Points:

·         You cannot control what is captured by the default trace, how many files it captures before rolling over, or any other options. Your only option is to turn it on or off. If you want a similar trace that differs in any way you can create your own and configure it to start when SQL Server starts (or whatever time period is appropriate).

·         The trace file name/number will continue to increase until you delete the files.

·         The trace does NOT capture all events, it is very lightweight.

 

References:

·         SQL Server 2008 Internals – Chapter 1 page 73

·         Searching for a Trace – Solving the mystery of SQL Server 2005’s default trace enabled option http://www.sqlmag.com/Articles/ArticleID/48939/pg/1/1.html

·         SQL Server Default Trace http://blogs.technet.com/beatrice/archive/2008/04/29/sql-server-default-trace.aspx

·         Default Trace in SQL Server 2005 http://blogs.technet.com/vipulshah/archive/2007/04/16/default-trace-in-sql-server-2005.aspx

·         Default Trace in SQL Server 2005 http://www.mssqltips.com/tip.asp?tip=1111

 

Query:

— Example of using the default trace to find out more about who/when/why a database is dropped or created

 

— Get current file name for existing traces

SELECT * FROM ::fn_trace_getinfo(0)

 

— CHANGE THIS VALUE to the current file name

DECLARE @Path nvarchar(2000)

SELECT  @Path = ‘C:Program FilesMicrosoft SQL ServerMSSQL10.WASHMSSQLLoglog_120.trc’

 

— Get information most relevant to CREATE/DROP database

SELECT SPID, LoginName, NTUserName, NTDomainName, HostName, ApplicationName, StartTime, ServerName, DatabaseName

      ,CASE EventClass

            WHEN 46 THEN ‘CREATE’

            WHEN 47 THEN ‘DROP’

            ELSE ‘OTHER’

       END AS EventClass

      , CASE ObjectType

            WHEN 16964 THEN ‘DATABASE’

            ELSE ‘OTHER’

       END AS ObjectType

      –,*

FROM fn_trace_gettable

(@Path, default)

WHERE ObjectType = 16964 /* Database */ AND EventSubClass = 1 /* Committed */

ORDER BY StartTime

GO

 

/* BOL

 

== Event Class

46 Object:Created

 Indicates that an object has been created, such as for CREATE INDEX, CREATE TABLE, and CREATE DATABASE statements.

47 Object:Deleted

 Indicates that an object has been deleted, such as in DROP INDEX and DROP TABLE statements.

 

 == Object Type

 16964 Database

 

== EventSubClass

 int Type of event subclass.

0=Begin

1=Commit

2=Rollback

 */

 

Backing up a corrupted SQL Server database

I had a question about how to do a backup and skip a corrupted block of data. First, DO NOT DO IT unless you absolutely have to, such as when you are taking a backup prior to trying to fix the corruption (which means you should be on the phone with Microsoft PSS). If you do skip corrupted data you have to consider the backup to be very suspect.

 

Do not ever ignore any indication of data inconsistency in the database. If you have corrupted data it is almost certainly a problem caused by something below the SQL Server level. If it happened once, chances are it will happen again… and again…. and again until the source of the problem is fixed. This means the instant you have any indication of a corrupt SQL Server database you should immediately ask for low-level hardware diagnostics and a thorough review of all logs (event viewer, SQL, hardware, etc.). Double check that if write caching is enabled on the hardware that it is battery backed and the battery is healthy. Double check that all firmware is up to date. Run a DBCC CHECKDB WITH ALL_ERRORMSGS and pay very close attention to the output. Find the source of your corruption and fix it.

 

There is a parameter CONTINUE_AFTER_ERROR for BACKUP and RESTORE, but it is a last ditch command that should only be used as a last resort. One example would be if it’s the only way to get a backup before you attempt to repair the corruption. It does not always work, it depends on what the error is. If you actually have to restore a database backup taken with this option, then you MUST fix the corruption before allowing users, applications, or other production processes back into the database. From BOL:

“We strongly recommend that you reserve using the CONTINUE_AFTER_ERROR option until you have exhausted all alternatives.”

“At the end of a restore sequence that continues despite errors, you may be able to repair the database with DBCC CHECKDB. For CHECKDB to run most consistently after using RESTORE CONTINUE_AFTER_ERROR, we recommend that you use the WITH TABLOCK option in your DBCC CHECKDB command.”

“Use NO_TRUNCATE or CONTINUE_AFTER_ERROR only if you are backing up the tail of a damaged database.”

 

Some suggestions:

·         For every 2005/2008 database, SET PAGE_VERIFY=CHECKSUM (in 2005 this cannot be turned on for TempDB, but it can be turned on for TempDB in 2008). For SQL Server 2000 set TORN_PAGE_DETECTION=ON. When upgrading from 2000 to newer versions set TORN_PAGE_DETECTION=OFF and SET PAGE_VERIFY=CHECKSUM.

·         For databases with CHECKSUM enabled, use the WITH CHECKSUM command on all backups.

·         Implement a “standards” or “best practices” document to handle corruption on each version of SQL Server.

·         Review your disaster recovery plans and upcoming testing. Testing of a full recovery of various scenarios should be done periodically. Some people think once a year is enough, others say monthly or quarterly is often enough. Having backups is not good enough, we have to know that they can be restored. There are also scenarios where backups are not the best way to recover from a problem.

 

Some great info from the person who wrote CHECKDB:

http://sqlskills.com/blogs/paul/post/Example-20002005-corrupt-databases-and-some-more-info-on-backup-restore-page-checksums-and-IO-errors.aspx

http://sqlskills.com/BLOGS/PAUL/category/Corruption.aspx

Compilation of SQL Server TempDB IO Best Practices

It is important to optimize TempDB for good performance. In particular, I am focusing on how to allocate files.

 

TempDB is a unique database in several ways. The ones most relevant to this discussion are:

·         It is often one of the busiest databases on an instance. This means the performance of TempDB is critical to your instance’s overall performance.

·         It is recreated as a copy of model each time SQL Server starts, taking all the properties of model except for the location, number, and size of its data and log files.

·         TempDB has a very high rate of create/drop object activity. This means the system metadata related to object creation/deletion is heavily used.

·         Slightly different logging and latching behavior.

 

General recommendations:

·         Pre-size TempDB appropriately. Leave autogrow on with instant file initialization enabled, but try to configure the database so that it never hits an autogrow event. Make sure the autogrow growth increment is appropriate.

·         Follow general IO recommendations for fast IO.

·         If your TempDB experiences metadata contention (waitresource = 2:1:1 or 2:1:3), you should split out your data onto multiple files. Generally you will want somewhere between 1/4 and 1 file per physical core. If you don’t want to wait to see if any metadata contention occurs you may want to start out with around 1/4 to 1/2 the number of data files as CPUs up to about 8 files. If you think you might need more than 8 files we should do some testing first to see what the impact is. For example, if you have 8 physical CPUs you may want to start with 2-4 data files and monitor for metadata contention.

·         All TempDB data files should be of equal size.

·         As with any database, your TempDB performance may improve if you spread it out over multiple drives. This only helps if each drive or mount point is truly a separate IO path. Whether each TempDB will have a measurable improvement from using multiple drives depends on the specific system.

·         In general you only need one log file. If you need to have multiple log files because you don’t have enough disk space on one drive that is fine, but there is no direct benefit from having the log on multiple files or drives.

·         On SQL Server 2000 and more rarely on SQL Server 2005 or later you may want to enable trace flag -T1118.

·         Avoid shrinking TempDB (or any database) files unless you are very certain you will never need the space again.

 

References:

·         Working with tempdb in SQL Server 2005 http://technet.microsoft.com/en-us/library/cc966545.aspx

o   “Divide tempdb into multiple data files of equal size. These multiple files don’t necessarily be on different disks/spindles unless you are also encountering I/O bottlenecks as well. The general recommendation is to have one file per CPU because only one thread is active per CPU at one time.”

o   “Having too many files increases the cost of file switching, requires more IAM pages, and increases the manageability overhead.”

·         How many files should a database have? – Part 1: OLAP workloads http://sqlcat.com/technicalnotes/archive/2008/03/07/How-many-files-should-a-database-have-part-1-olap-workloads.aspx

o   If you have too many files you can end up with smaller IO block sizes and decreased performance under extremely heavy load.

o   If you have too few files you can end up with decreased performance to GAM/SGAM contention (generally the problem you see in TempDB) or PFS contention (extremely heavy inserts).

o   The more files you have per database the longer it takes to do database recovery (bringing a database online, such as during SQL Server startup). This can become a problem with hundreds of files.

·         SQL Server Urban Legends Discussed http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

o   ” SQL Server uses asynchronous I/O allowing any worker to issue an I/O requests regardless of the number and size of the database files or what scheduler is involved.”

o   ” Tempdb is the database with the highest level of create and drop actions and under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks.   SQL Server 2005 reduces contention with the ‘cached temp table’ feature and allocation contention skip ahead actions.”

·         Concurrency enhancements for the tempdb database http://support.microsoft.com/kb/328551

o   Note that this was originally written for SQL Server 2000 (the applies to section only lists 2000) and there are some tweaks/considerations for later versions that are not covered completely in this article. For example, -T1118 is not only much less necessary on SQL Server 2005+, it can in some cases cause problems.

·         FIX: Blocking and performance problems may occur when you enable trace flag 1118 in SQL Server 2005 if the temporary table creation workload is high http://support.microsoft.com/default.aspx?scid=kb;EN-US;936185

o   If you have SP2 based CU2 or later you will not see the problems described in this article. Also, on SP2 based CU2 or higher you are much less likely to even need -T1118 on a heavily used TempDB.

o   ” This hotfix significantly reduces the need to force uniform allocations by using trace flag 1118. If you apply the fix and are still encountering TEMPDB contention, consider also turning on trace flag 1118.”

·         Misconceptions around TF 1118 http://sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-TF-1118.aspx

o   ” turn on TF1118, which makes the first 8 data pages in the temp table come from a dedicated extent “

o   “Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there’s latch contention), now you don’t need so many – so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there’s usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization – your mileage may vary – don’t post a comment saying this is wrong because your system benefits from 12 data files. It’s a generalization, to which there are always exceptions.”

·         Storage Top 10 Best Practices http://sqlcat.com/top10lists/archive/2007/11/21/storage-top-10-best-practices.aspx  

o   “Make sure to move TEMPDB to adequate storage and pre-size after installing SQL Server. “

o   “Performance may benefit if TEMPDB is placed on RAID 1+0 (dependent on TEMPDB usage). “

o   “This is especially true for TEMPDB where the recommendation is 1 data file per CPU. “

o   “Dual core counts as 2 CPUs; logical procs (hyperthreading) do not. “

o   “Data files should be of equal size – SQL Server uses a proportional fill algorithm that favors allocations in files with more free space.

o   “Pre-size data and log files. “

o   “Do not rely on AUTOGROW, instead manage the growth of these files manually. You may leave AUTOGROW ON for safety reasons, but you should proactively manage the growth of the data files. “

Optimizing tempdb Performance http://msdn.microsoft.com/en-us/library/ms175527.aspx

SQL Server and Fragmentation

There are several types of fragmentation that affect SQL Server, all of which have the potential to cause performance problems.

 

1)      At the disk/OS level individual files can become fragmented due to growth over time. This can be addressed by a defragmentation tool at the SAN or Windows level. Ask your storage team about your options there.

2)      Within a SQL Server database individual pages can become internally fragmented so that many pages are less full than expected. This means more pages are needed to store the same amount of data which means more reads to satisfy each query and less total data that can be kept in memory. This can decrease performance.

3)      Within a SQL Server database the index pages and extents can become less sequential. This can make scans slower. Though hopefully you are doing more seeks than scans this can still be an important factor in performance.

Sharepoint:

If you are using Sharepoint, you need to be aware that you cannot change the schema at all, including adding or modifying indexes. Here is some information on Sharepoint support:

·         943345  How to defragment Windows SharePoint Services 3.0 databases and SharePoint Server 2007 databases http://support.microsoft.com/default.aspx?scid=kb;EN-US;943345

·         841057  Support for changes to the databases that are used by Office server products and by Windows SharePoint Services http://support.microsoft.com/default.aspx?scid=kb;EN-US;841057

·         968785  After you install Windows SharePoint Services 3.0 Service Pack 2, the Windows SharePoint Services 3.0 database can be auto-defragmented http://support.microsoft.com/default.aspx?scid=kb;EN-US;968785

·         Note that with Sharepoint, the recommended break-point where you change from REORGANIZE to REBUILD is 70% instead of the more standard 30%. Your maintenance window may be a bit longer but concurrency will increase for your users on the tables with BLOBs.

Disk/OS level fragmentation:

In a SAN environment, sometimes you cannot use the windows tools to tell you how fragmented a volume is since some characteristics may be hidden from Windows and known only to the SAN. If your storage team has SAN tools to tell them about fragmentation, then use that. Windows can see a SAN volume as fragmented when the SAN has already taken care of the fragmentation.

·         This SQL IO basics document has some details on disk defragmentation http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/SQLIOBasicsCh2.doc

Database level fragmentation:

·         sys.dm_db_index_physical_stats (Transact-SQL) http://msdn.microsoft.com/en-us/library/ms188917.aspx

·         Reorganizing and Rebuilding Indexes http://msdn.microsoft.com/en-us/library/ms189858.aspx

·         Guidelines for Performing Online Index Operations http://msdn.microsoft.com/en-us/library/ms190981.aspx

·         Rebuilding Indexes and Updating Statistics http://sqlskills.com/blogs/paul/post/Search-Engine-QA-10-Rebuilding-Indexes-and-Updating-Statistics.aspx

 

When you REORGANIZE (in older versions of SQL Server this was DBCC INDEXDEFRAG) it compares two pages at a time and works its way through the index. If it is killed before it stops no work done is lost. However, for a very defragmented index this operation will take much longer than REBUILD. The more fragmentation you have the longer this operation takes to complete. There may still be some fragmentation left after this operation completes.

 

When you REBUILD (DBCC DBREINDEX in earlier versions) a copy is made of the index and the new version is entirely sequential and compact. It is faster to use this method when the fragmentation level is high. REBUILD also does the equivalent of an UPDATE STATISTICS WITH FULLSCAN which gives very accurate statistics. The amount of time to complete has little to do with the amount of fragmentation and instead depends on the total amount of data, the data types, the index structure, etc. Internally this is pretty much equivalent to creating a new index (without the overhead of a sort) then dropping the old one within a transaction. For some indexes it is possible to do a minimally logged operation if you are not using the FULL recovery mode. Not using FULL recovery mode means you have a greater exposure to possible data loss in the case of certain types of system failures, this is not directly related to the REBUILD operation but only to the recovery mode. Extra precautions such as a log backup immediately before and after the change of recovery mode. For non-clustered indexes you can sometimes use the SORT_IN_TEMPDB option to avoid a sort. For tables with no XML or spatial indexes you may be able to set ONLINE=ON in Enterprise Edition so that users can continue to access the index while it is being rebuilt (there are several restrictions on when this is allowed). If there is little free space in the database the new index may not be completely defragmented.

FILLFACTOR:

The FILLFACTOR setting is related to fragmentation and may be changed during a REBUILD. You can set the FILLFACTOR to a higher or lower number to adjust how much free space is left on each page. The tradeoff is in space used vs. the cost of future page splits as data is added or rows get bigger (variable length or nullable columns get more data). The FILLFACTOR does not prevent a page from filling, it sets the amount of free space only at the time the index is created or rebuilt. By default the pages are created as full as possible so unless you have explicitly set your FILLFACTOR to a lower amount for one or more indexes then you won’t be able to save space that way. If you have to defragment an index more often than you would like, and the fragmentation is due mostly to page splits, then you might want to set a lower FILLFACTOR. If you rarely have to do defragmentation and you have a lot of empty space per page, you might want to increase the FILLFACTOR. If you want to guess at a “best” value on each index you need to know the average width of the rows, the clustered index key values (if you’re setting it for a non-clustered index), the expected amount of inserts and whether they will cause a lot of page splits, are inserts on a monotonically increasing key like IDENTITY, whether updates will cause a lot of row movement or changes in row size, whether deletes are likely to happen in random spots or at the “beginning” or “end”, the volume of data modifications compared to total rows and whether the data is compressed. All of which lead back to how much fragmentation do you expect (and does that fragmentation cause any perceivable performance problems) and how often can/will you defragment the indexes. So if you see a lot of fragmentation and when you defragment your performance improves (not always the case), have a lot of page splits (PerfMon: SQL Server Access Methods.Page Splits/sec), and have the space available to rebuild your index then a lower FILLFACTOR is one possible solution. When you rebuild the indexes it will often change the total space allocated to the table, partly due to FILLFACTOR and partly due to the fragmentation cleanup.

 

References:

·         BOL Fill Factor http://msdn.microsoft.com/en-us/library/ms177459.aspx

·         BOL Create Index http://msdn.microsoft.com/en-us/library/ms188783.aspx

·         Database Maintenance Best Practices Part II – Setting FILLFACTOR

http://www.sqlskills.com/BLOGS/KIMBERLY/post/Database-Maintenance-Best-Practices-Part-II-e28093-the-most-important-setting-FILLFACTOR.aspx

·         Who Cares about FillFactor? http://www.sqlservercentral.com/articles/Performance+Tuning/whocaresaboutfillfactor/1196/

·         Kimberly Tripp on FILLFACTOR http://www.sqlskills.com/BLOGS/KIMBERLY/category/FILLFACTOR.aspx

·         Top Tips for Effective Database Maintenance http://technet.microsoft.com/en-us/magazine/2008.08.database.aspx

SQL Server 2008 setup may report a Windows 2008 cluster validation failure

You may receive the error “The cluster either has not been verified or there are errors or failures in the verification report” while installing a SQL Server clustered installation on Windows 2008. When see this error you must fix the underlying error that caused Windows validation to fail. Even though in some cases you could avoid the error and continue SQL Server setup with the undocumented SkipRules setting, this is not advisable and will leave you in an unsupported state (unless you are using a geo cluster with no shared storage).

 

The Microsoft Support Policy for Windows Server 2008 Failover Clusters

http://support.microsoft.com/kb/943984

For the Windows Server 2008 Failover Clustering solution to be considered an officially supported solution by Microsoft Customer Support Services (CSS), the solution must meet the following criteria:

·         All hardware and software components must meet the qualifications to receive a “Certified for Windows Server 2008” logo.

·         The fully configured solution must pass the Validate test in the Failover Clusters Management snap-in.

 

Failover Cluster Step-by-Step Guide: Validating Hardware for a Failover Cluster

http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx

What to do if validation tests fail

In most cases, if any tests in the cluster validation wizard fail, then Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multi-site (geographically dispersed) clusters where there is no shared storage. In this scenario the expected result of the validation wizard is that the storage tests will fail. This is still a supported solution if the remainder of the tests complete successfully.

 

Error message when you install SQL Server 2008 on a Windows Server 2008-based cluster: “The cluster either has not been verified or there are errors or failures in the verification report. Refer to KB953748 or SQL server books online”

http://support.microsoft.com/kb/953748

What to do if validation tests fail

In most cases, if any tests in the cluster validation rule fail, Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multiple-site (geographically dispersed) clusters where there is no shared storage. In this scenario, the expected result of the cluster validation wizard is that the storage tests will fail. This is still a supported solution if the rest of the tests finish successfully.

How to tell what edition your SQL Server setup files are

The setup files for SQL Server look pretty much the same for all editions. If you saved the files somewhere but didn’t indicate in your directory name or a readme file which edition they are for, how do you tell the edition? Running setup doesn’t necessarily help, the initial setup screens for SQL Server 2005 Enterprise and Standard both show the same EULA that includes “MICROSOFT SQL SERVER 2005 STANDARD AND ENTERPRISE EDITIONS”. To see the edition of the setup files, look in the Servers directory of your installation files from the setup CD (there are two high level directories: Servers and Tools). In Servers there is a file called default.htm and when you open it, you will see the edition of your SQL Server setup files.

How to use PsGetSid for SQL Server

To translate a SID to a name or vice versa, you can use this tool from sysinternals:

 

 

PsGetSid v1.43

http://technet.microsoft.com/en-us/sysinternals/bb897417.aspx

 

You can use it to translate the SID in these registry keys into the SQL Domain Groups or local groups used for SQL Server 2005/2008 instance permissions. The SID is stored in this key:

HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL.XSetup

 

AGTGroup (Setup grants this domain group the permissions needed by the SQL Server Agent service account)

SQLGroup (Setup grants this domain group the permissions needed by the SQL Server service account)

FTSGroup (Setup grants this domain group the permissions needed by the Full Text Search service account)

ASGroup (Setup grants this domain group the permissions needed by the Analysis Services service account)

 

If you’re not sure what MSSQL.X value to use, look up your instance name in

HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerInstance NamesSQL

 

SQL Server creates these keys and populates them with SID values. The service accounts you specify during setup are added to the groups.

 

On a standalone system those SIDs map to local groups created by setup including:

SQLServerMSSQLUser$ComputerName$InstanceName

SQLServerSQLAgentUser$ComputerName$InstanceName

SQLServerMSOLAPUser$ComputerName$InstanceName

SQLServerDTSUser$ComputerName

SQLServerSQLBrowserUser$ComputerName

SQLServerMSSQLServerADHelperUser$ComputerName

 

On a cluster, the cluster aware components SQL, AS, and Agent don’t have local groups created. Instead you pre-create global groups (for SQL 2008 on Windows 2008 you have the option to use Service SIDs instead) and specify the group names during setup. Then setup grants the permissions to the global groups. Since often the account you are using for setup doesn’t have permissions to add accounts to groups you may want to have someone with appropriate permissions pre-add the service accounts to the groups.

 

If you ever need to find out what the name of the local or domain groups are, you can use the above PsGetSid tool to find that. For example, you may want to move a system to a new piece of hardware configured exactly like the old one, then you will remove the old system. You will need to know what groups were used for that instance of SQL Server (you are following good security practices and using a unique group for each instance and each service within each instance to reduce your vulnerabilty across systems if one instance is hacked, right?) so that you can reuse that group on the new system. A disaster recovery scenario is another situation where you will want to know how to map the SIDs to the names. You may want to back up these keys.

 

915846  Best practices that you can use to set up domain groups and solutions to problems that may occur when you set up a domain group when you install a SQL Server 2005 failover cluster http://support.microsoft.com/default.aspx?scid=kb;EN-US;915846

%d bloggers like this: