Cindy Gross | Befriending Dragons

Category: Small Bites of Big Data

Technology, Tech, Big Data, SQL Server, Azure, cloud

SQL Server’s Default Trace

Are you familiar with SQL Server’s default trace setting? It can be helpful with finding basic who/when type information on major events. For example, you may want to know who was creating and dropping databases on a given instance.

SQL Server has a couple of options that might help you find out more about when/by who the database is being created and dropped. One is Policy Based Management but you would need to configure it ahead of time. Another option is to run a profiler trace that captures information such as CREATE, ALTER, DROP DATABASE. Some of the DMVs might have the execution information if you capture it fast enough after it happens. XEvents can be used in SQL 2008 to find all sorts of information. However, the one that might be most appropriate in this case is the Default Trace.

1)      Make sure the default trace is enabled in your configuration options for this instance. If it is not enabled, you can enable it through the sp_configure settings.

— Check to see if the default trace is enabled (0=off, 1=on)

EXEC sp_configure ‘default trace enabled’

GO

— Enabled the default trace

EXEC sp_configure ‘default trace enabled’, 1

GO

RECONFIGURE

2)      The trace files will eventually overwrite themselves, so check for the output soon after the problem occurs (perhaps make periodic copies of the files). They will be under the log directory where SQL Server is installed. For example, for my SQL 2008 instance named WASH the output files are in C:Program FilesMicrosoft SQL ServerMSSQL10.WASHMSSQLLog. The files will be named log_xxx.trc and there will be up to 5 of them.

3)      Find the trace which covers the time period when the database was created or dropped. You can either open it in the Profiler GUI or you can use the query below to pull out the appropriate data. Look for the create and/or drop events and see who executed them from what workstation and at what time. Some applications will send their “application name” so you may be able to tell that as well.

Key Points:

·         You cannot control what is captured by the default trace, how many files it captures before rolling over, or any other options. Your only option is to turn it on or off. If you want a similar trace that differs in any way you can create your own and configure it to start when SQL Server starts (or whatever time period is appropriate).

·         The trace file name/number will continue to increase until you delete the files.

·         The trace does NOT capture all events, it is very lightweight.

References:

·         SQL Server 2008 Internals – Chapter 1 page 73

·         Searching for a Trace – Solving the mystery of SQL Server 2005’s default trace enabled option http://www.sqlmag.com/Articles/ArticleID/48939/pg/1/1.html

·         SQL Server Default Trace http://blogs.technet.com/beatrice/archive/2008/04/29/sql-server-default-trace.aspx

·         Default Trace in SQL Server 2005 http://blogs.technet.com/vipulshah/archive/2007/04/16/default-trace-in-sql-server-2005.aspx

·         Default Trace in SQL Server 2005 http://www.mssqltips.com/tip.asp?tip=1111

Query:

— Example of using the default trace to find out more about who/when/why a database is dropped or created

— Get current file name for existing traces

SELECT * FROM ::fn_trace_getinfo(0)

— CHANGE THIS VALUE to the current file name

DECLARE @Path nvarchar(2000)

SELECT @Path = ‘C:Program FilesMicrosoft SQL ServerMSSQL10.WASHMSSQLLoglog_120.trc’

— Get information most relevant to CREATE/DROP database

SELECT SPID, LoginName, NTUserName, NTDomainName, HostName, ApplicationName, StartTime, ServerName, DatabaseName

      ,CASE EventClass

            WHEN 46 THEN ‘CREATE’

            WHEN 47 THEN ‘DROP’

            ELSE ‘OTHER’

      END AS EventClass

      , CASE ObjectType

            WHEN 16964 THEN ‘DATABASE’

            ELSE ‘OTHER’

      END AS ObjectType

      –,*

FROM fn_trace_gettable

(@Path, default)

WHERE ObjectType = 16964 /* Database */ AND EventSubClass = 1 /* Committed */

ORDER BY StartTime

GO

/* BOL

== Event Class

46 Object:Created

Indicates that an object has been created, such as for CREATE INDEX, CREATE TABLE, and CREATE DATABASE statements.

47 Object:Deleted

Indicates that an object has been deleted, such as in DROP INDEX and DROP TABLE statements.

== Object Type

16964 Database

== EventSubClass

int Type of event subclass.

0=Begin

1=Commit

2=Rollback

*/

December 16, 2009
Backing up a corrupted SQL Server database

I had a question about how to do a backup and skip a corrupted block of data. First, DO NOT DO IT unless you absolutely have to, such as when you are taking a backup prior to trying to fix the corruption (which means you should be on the phone with Microsoft PSS). If you do skip corrupted data you have to consider the backup to be very suspect.

Do not ever ignore any indication of data inconsistency in the database. If you have corrupted data it is almost certainly a problem caused by something below the SQL Server level. If it happened once, chances are it will happen again… and again…. and again until the source of the problem is fixed. This means the instant you have any indication of a corrupt SQL Server database you should immediately ask for low-level hardware diagnostics and a thorough review of all logs (event viewer, SQL, hardware, etc.). Double check that if write caching is enabled on the hardware that it is battery backed and the battery is healthy. Double check that all firmware is up to date. Run a DBCC CHECKDB WITH ALL_ERRORMSGS and pay very close attention to the output. Find the source of your corruption and fix it.

There is a parameter CONTINUE_AFTER_ERROR for BACKUP and RESTORE, but it is a last ditch command that should only be used as a last resort. One example would be if it’s the only way to get a backup before you attempt to repair the corruption. It does not always work, it depends on what the error is. If you actually have to restore a database backup taken with this option, then you MUST fix the corruption before allowing users, applications, or other production processes back into the database. From BOL:

“We strongly recommend that you reserve using the CONTINUE_AFTER_ERROR option until you have exhausted all alternatives.”

“At the end of a restore sequence that continues despite errors, you may be able to repair the database with DBCC CHECKDB. For CHECKDB to run most consistently after using RESTORE CONTINUE_AFTER_ERROR, we recommend that you use the WITH TABLOCK option in your DBCC CHECKDB command.”

“Use NO_TRUNCATE or CONTINUE_AFTER_ERROR only if you are backing up the tail of a damaged database.”

Some suggestions:

·         For every 2005/2008 database, SET PAGE_VERIFY=CHECKSUM (in 2005 this cannot be turned on for TempDB, but it can be turned on for TempDB in 2008). For SQL Server 2000 set TORN_PAGE_DETECTION=ON. When upgrading from 2000 to newer versions set TORN_PAGE_DETECTION=OFF and SET PAGE_VERIFY=CHECKSUM.

·         For databases with CHECKSUM enabled, use the WITH CHECKSUM command on all backups.

·         Implement a “standards” or “best practices” document to handle corruption on each version of SQL Server.

·         Review your disaster recovery plans and upcoming testing. Testing of a full recovery of various scenarios should be done periodically. Some people think once a year is enough, others say monthly or quarterly is often enough. Having backups is not good enough, we have to know that they can be restored. There are also scenarios where backups are not the best way to recover from a problem.

Some great info from the person who wrote CHECKDB:

http://sqlskills.com/blogs/paul/post/Example-20002005-corrupt-databases-and-some-more-info-on-backup-restore-page-checksums-and-IO-errors.aspx

http://sqlskills.com/BLOGS/PAUL/category/Corruption.aspx

December 2, 2009
Compilation of SQL Server TempDB IO Best Practices

It is important to optimize TempDB for good performance. In particular, I am focusing on how to allocate files.

TempDB is a unique database in several ways. The ones most relevant to this discussion are:

·         It is often one of the busiest databases on an instance. This means the performance of TempDB is critical to your instance’s overall performance.

·         It is recreated as a copy of model each time SQL Server starts, taking all the properties of model except for the location, number, and size of its data and log files.

·         TempDB has a very high rate of create/drop object activity. This means the system metadata related to object creation/deletion is heavily used.

·         Slightly different logging and latching behavior.

General recommendations:

·         Pre-size TempDB appropriately. Leave autogrow on with instant file initialization enabled, but try to configure the database so that it never hits an autogrow event. Make sure the autogrow growth increment is appropriate.

·         Follow general IO recommendations for fast IO.

·         If your TempDB experiences metadata contention (waitresource = 2:1:1 or 2:1:3), you should split out your data onto multiple files. Generally you will want somewhere between 1/4 and 1 file per physical core. If you don’t want to wait to see if any metadata contention occurs you may want to start out with around 1/4 to 1/2 the number of data files as CPUs up to about 8 files. If you think you might need more than 8 files we should do some testing first to see what the impact is. For example, if you have 8 physical CPUs you may want to start with 2-4 data files and monitor for metadata contention.

·         All TempDB data files should be of equal size.

·         As with any database, your TempDB performance may improve if you spread it out over multiple drives. This only helps if each drive or mount point is truly a separate IO path. Whether each TempDB will have a measurable improvement from using multiple drives depends on the specific system.

·         In general you only need one log file. If you need to have multiple log files because you don’t have enough disk space on one drive that is fine, but there is no direct benefit from having the log on multiple files or drives.

·         On SQL Server 2000 and more rarely on SQL Server 2005 or later you may want to enable trace flag -T1118.

·         Avoid shrinking TempDB (or any database) files unless you are very certain you will never need the space again.

References:

·         Working with tempdb in SQL Server 2005 http://technet.microsoft.com/en-us/library/cc966545.aspx

o   “Divide tempdb into multiple data files of equal size. These multiple files don’t necessarily be on different disks/spindles unless you are also encountering I/O bottlenecks as well. The general recommendation is to have one file per CPU because only one thread is active per CPU at one time.”

o   “Having too many files increases the cost of file switching, requires more IAM pages, and increases the manageability overhead.”

·         How many files should a database have? – Part 1: OLAP workloads http://sqlcat.com/technicalnotes/archive/2008/03/07/How-many-files-should-a-database-have-part-1-olap-workloads.aspx

o   If you have too many files you can end up with smaller IO block sizes and decreased performance under extremely heavy load.

o   If you have too few files you can end up with decreased performance to GAM/SGAM contention (generally the problem you see in TempDB) or PFS contention (extremely heavy inserts).

o   The more files you have per database the longer it takes to do database recovery (bringing a database online, such as during SQL Server startup). This can become a problem with hundreds of files.

·         SQL Server Urban Legends Discussed http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

o   ” SQL Server uses asynchronous I/O allowing any worker to issue an I/O requests regardless of the number and size of the database files or what scheduler is involved.”

o   ” Tempdb is the database with the highest level of create and drop actions and under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks.   SQL Server 2005 reduces contention with the ‘cached temp table’ feature and allocation contention skip ahead actions.”

·         Concurrency enhancements for the tempdb database http://support.microsoft.com/kb/328551

o   Note that this was originally written for SQL Server 2000 (the applies to section only lists 2000) and there are some tweaks/considerations for later versions that are not covered completely in this article. For example, -T1118 is not only much less necessary on SQL Server 2005+, it can in some cases cause problems.

·         FIX: Blocking and performance problems may occur when you enable trace flag 1118 in SQL Server 2005 if the temporary table creation workload is high http://support.microsoft.com/default.aspx?scid=kb;EN-US;936185

o   If you have SP2 based CU2 or later you will not see the problems described in this article. Also, on SP2 based CU2 or higher you are much less likely to even need -T1118 on a heavily used TempDB.

o   ” This hotfix significantly reduces the need to force uniform allocations by using trace flag 1118. If you apply the fix and are still encountering TEMPDB contention, consider also turning on trace flag 1118.”

·         Misconceptions around TF 1118 http://sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-TF-1118.aspx

o   ” turn on TF1118, which makes the first 8 data pages in the temp table come from a dedicated extent “

o   “Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there’s latch contention), now you don’t need so many – so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there’s usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization – your mileage may vary – don’t post a comment saying this is wrong because your system benefits from 12 data files. It’s a generalization, to which there are always exceptions.”

·         Storage Top 10 Best Practices http://sqlcat.com/top10lists/archive/2007/11/21/storage-top-10-best-practices.aspx

o   “Make sure to move TEMPDB to adequate storage and pre-size after installing SQL Server. “

o   “Performance may benefit if TEMPDB is placed on RAID 1+0 (dependent on TEMPDB usage). “

o   “This is especially true for TEMPDB where the recommendation is 1 data file per CPU. “

o   “Dual core counts as 2 CPUs; logical procs (hyperthreading) do not. “

o   “Data files should be of equal size – SQL Server uses a proportional fill algorithm that favors allocations in files with more free space.

o   “Pre-size data and log files. “

o   “Do not rely on AUTOGROW, instead manage the growth of these files manually. You may leave AUTOGROW ON for safety reasons, but you should proactively manage the growth of the data files. “

Optimizing tempdb Performance http://msdn.microsoft.com/en-us/library/ms175527.aspx

November 20, 2009
SQL Server and Fragmentation

There are several types of fragmentation that affect SQL Server, all of which have the potential to cause performance problems.

1)      At the disk/OS level individual files can become fragmented due to growth over time. This can be addressed by a defragmentation tool at the SAN or Windows level. Ask your storage team about your options there.

2)      Within a SQL Server database individual pages can become internally fragmented so that many pages are less full than expected. This means more pages are needed to store the same amount of data which means more reads to satisfy each query and less total data that can be kept in memory. This can decrease performance.

3)      Within a SQL Server database the index pages and extents can become less sequential. This can make scans slower. Though hopefully you are doing more seeks than scans this can still be an important factor in performance.

Sharepoint:

If you are using Sharepoint, you need to be aware that you cannot change the schema at all, including adding or modifying indexes. Here is some information on Sharepoint support:

·         943345 How to defragment Windows SharePoint Services 3.0 databases and SharePoint Server 2007 databases http://support.microsoft.com/default.aspx?scid=kb;EN-US;943345

·         841057 Support for changes to the databases that are used by Office server products and by Windows SharePoint Services http://support.microsoft.com/default.aspx?scid=kb;EN-US;841057

·         968785 After you install Windows SharePoint Services 3.0 Service Pack 2, the Windows SharePoint Services 3.0 database can be auto-defragmented http://support.microsoft.com/default.aspx?scid=kb;EN-US;968785

·         Note that with Sharepoint, the recommended break-point where you change from REORGANIZE to REBUILD is 70% instead of the more standard 30%. Your maintenance window may be a bit longer but concurrency will increase for your users on the tables with BLOBs.

Disk/OS level fragmentation:

In a SAN environment, sometimes you cannot use the windows tools to tell you how fragmented a volume is since some characteristics may be hidden from Windows and known only to the SAN. If your storage team has SAN tools to tell them about fragmentation, then use that. Windows can see a SAN volume as fragmented when the SAN has already taken care of the fragmentation.

·         This SQL IO basics document has some details on disk defragmentation http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/SQLIOBasicsCh2.doc

Database level fragmentation:

·         sys.dm_db_index_physical_stats (Transact-SQL) http://msdn.microsoft.com/en-us/library/ms188917.aspx

·         Reorganizing and Rebuilding Indexes http://msdn.microsoft.com/en-us/library/ms189858.aspx

·         Guidelines for Performing Online Index Operations http://msdn.microsoft.com/en-us/library/ms190981.aspx

·         Rebuilding Indexes and Updating Statistics http://sqlskills.com/blogs/paul/post/Search-Engine-QA-10-Rebuilding-Indexes-and-Updating-Statistics.aspx

When you REORGANIZE (in older versions of SQL Server this was DBCC INDEXDEFRAG) it compares two pages at a time and works its way through the index. If it is killed before it stops no work done is lost. However, for a very defragmented index this operation will take much longer than REBUILD. The more fragmentation you have the longer this operation takes to complete. There may still be some fragmentation left after this operation completes.

When you REBUILD (DBCC DBREINDEX in earlier versions) a copy is made of the index and the new version is entirely sequential and compact. It is faster to use this method when the fragmentation level is high. REBUILD also does the equivalent of an UPDATE STATISTICS WITH FULLSCAN which gives very accurate statistics. The amount of time to complete has little to do with the amount of fragmentation and instead depends on the total amount of data, the data types, the index structure, etc. Internally this is pretty much equivalent to creating a new index (without the overhead of a sort) then dropping the old one within a transaction. For some indexes it is possible to do a minimally logged operation if you are not using the FULL recovery mode. Not using FULL recovery mode means you have a greater exposure to possible data loss in the case of certain types of system failures, this is not directly related to the REBUILD operation but only to the recovery mode. Extra precautions such as a log backup immediately before and after the change of recovery mode. For non-clustered indexes you can sometimes use the SORT_IN_TEMPDB option to avoid a sort. For tables with no XML or spatial indexes you may be able to set ONLINE=ON in Enterprise Edition so that users can continue to access the index while it is being rebuilt (there are several restrictions on when this is allowed). If there is little free space in the database the new index may not be completely defragmented.

FILLFACTOR:

The FILLFACTOR setting is related to fragmentation and may be changed during a REBUILD. You can set the FILLFACTOR to a higher or lower number to adjust how much free space is left on each page. The tradeoff is in space used vs. the cost of future page splits as data is added or rows get bigger (variable length or nullable columns get more data). The FILLFACTOR does not prevent a page from filling, it sets the amount of free space only at the time the index is created or rebuilt. By default the pages are created as full as possible so unless you have explicitly set your FILLFACTOR to a lower amount for one or more indexes then you won’t be able to save space that way. If you have to defragment an index more often than you would like, and the fragmentation is due mostly to page splits, then you might want to set a lower FILLFACTOR. If you rarely have to do defragmentation and you have a lot of empty space per page, you might want to increase the FILLFACTOR. If you want to guess at a “best” value on each index you need to know the average width of the rows, the clustered index key values (if you’re setting it for a non-clustered index), the expected amount of inserts and whether they will cause a lot of page splits, are inserts on a monotonically increasing key like IDENTITY, whether updates will cause a lot of row movement or changes in row size, whether deletes are likely to happen in random spots or at the “beginning” or “end”, the volume of data modifications compared to total rows and whether the data is compressed. All of which lead back to how much fragmentation do you expect (and does that fragmentation cause any perceivable performance problems) and how often can/will you defragment the indexes. So if you see a lot of fragmentation and when you defragment your performance improves (not always the case), have a lot of page splits (PerfMon: SQL Server Access Methods.Page Splits/sec), and have the space available to rebuild your index then a lower FILLFACTOR is one possible solution. When you rebuild the indexes it will often change the total space allocated to the table, partly due to FILLFACTOR and partly due to the fragmentation cleanup.

References:

·         BOL Fill Factor http://msdn.microsoft.com/en-us/library/ms177459.aspx

·         BOL Create Index http://msdn.microsoft.com/en-us/library/ms188783.aspx

·         Database Maintenance Best Practices Part II – Setting FILLFACTOR

http://www.sqlskills.com/BLOGS/KIMBERLY/post/Database-Maintenance-Best-Practices-Part-II-e28093-the-most-important-setting-FILLFACTOR.aspx

·         Who Cares about FillFactor? http://www.sqlservercentral.com/articles/Performance+Tuning/whocaresaboutfillfactor/1196/

·         Kimberly Tripp on FILLFACTOR http://www.sqlskills.com/BLOGS/KIMBERLY/category/FILLFACTOR.aspx

·         Top Tips for Effective Database Maintenance http://technet.microsoft.com/en-us/magazine/2008.08.database.aspx

November 20, 2009
SQL Server 2008 setup may report a Windows 2008 cluster validation failure

You may receive the error “The cluster either has not been verified or there are errors or failures in the verification report” while installing a SQL Server clustered installation on Windows 2008. When see this error you must fix the underlying error that caused Windows validation to fail. Even though in some cases you could avoid the error and continue SQL Server setup with the undocumented SkipRules setting, this is not advisable and will leave you in an unsupported state (unless you are using a geo cluster with no shared storage).

The Microsoft Support Policy for Windows Server 2008 Failover Clusters

http://support.microsoft.com/kb/943984

For the Windows Server 2008 Failover Clustering solution to be considered an officially supported solution by Microsoft Customer Support Services (CSS), the solution must meet the following criteria:

· All hardware and software components must meet the qualifications to receive a “Certified for Windows Server 2008” logo.

· The fully configured solution must pass the Validate test in the Failover Clusters Management snap-in.

Failover Cluster Step-by-Step Guide: Validating Hardware for a Failover Cluster

http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx

What to do if validation tests fail

In most cases, if any tests in the cluster validation wizard fail, then Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multi-site (geographically dispersed) clusters where there is no shared storage. In this scenario the expected result of the validation wizard is that the storage tests will fail. This is still a supported solution if the remainder of the tests complete successfully.

Error message when you install SQL Server 2008 on a Windows Server 2008-based cluster: “The cluster either has not been verified or there are errors or failures in the verification report. Refer to KB953748 or SQL server books online”

http://support.microsoft.com/kb/953748

What to do if validation tests fail

In most cases, if any tests in the cluster validation rule fail, Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multiple-site (geographically dispersed) clusters where there is no shared storage. In this scenario, the expected result of the cluster validation wizard is that the storage tests will fail. This is still a supported solution if the rest of the tests finish successfully.

November 20, 2009
How to tell what edition your SQL Server setup files are

The setup files for SQL Server look pretty much the same for all editions. If you saved the files somewhere but didn’t indicate in your directory name or a readme file which edition they are for, how do you tell the edition? Running setup doesn’t necessarily help, the initial setup screens for SQL Server 2005 Enterprise and Standard both show the same EULA that includes “MICROSOFT SQL SERVER 2005 STANDARD AND ENTERPRISE EDITIONS”. To see the edition of the setup files, look in the Servers directory of your installation files from the setup CD (there are two high level directories: Servers and Tools). In Servers there is a file called default.htm and when you open it, you will see the edition of your SQL Server setup files.

November 20, 2009
How to use PsGetSid for SQL Server

To translate a SID to a name or vice versa, you can use this tool from sysinternals:

PsGetSid v1.43

http://technet.microsoft.com/en-us/sysinternals/bb897417.aspx

You can use it to translate the SID in these registry keys into the SQL Domain Groups or local groups used for SQL Server 2005/2008 instance permissions. The SID is stored in this key:

HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL.XSetup

AGTGroup (Setup grants this domain group the permissions needed by the SQL Server Agent service account)

SQLGroup (Setup grants this domain group the permissions needed by the SQL Server service account)

FTSGroup (Setup grants this domain group the permissions needed by the Full Text Search service account)

ASGroup (Setup grants this domain group the permissions needed by the Analysis Services service account)

If you’re not sure what MSSQL.X value to use, look up your instance name in

HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerInstance NamesSQL

SQL Server creates these keys and populates them with SID values. The service accounts you specify during setup are added to the groups.

On a standalone system those SIDs map to local groups created by setup including:

SQLServerMSSQLUser$ComputerName$InstanceName

SQLServerSQLAgentUser$ComputerName$InstanceName

SQLServerMSOLAPUser$ComputerName$InstanceName

SQLServerDTSUser$ComputerName

SQLServerSQLBrowserUser$ComputerName

SQLServerMSSQLServerADHelperUser$ComputerName

On a cluster, the cluster aware components SQL, AS, and Agent don’t have local groups created. Instead you pre-create global groups (for SQL 2008 on Windows 2008 you have the option to use Service SIDs instead) and specify the group names during setup. Then setup grants the permissions to the global groups. Since often the account you are using for setup doesn’t have permissions to add accounts to groups you may want to have someone with appropriate permissions pre-add the service accounts to the groups.

If you ever need to find out what the name of the local or domain groups are, you can use the above PsGetSid tool to find that. For example, you may want to move a system to a new piece of hardware configured exactly like the old one, then you will remove the old system. You will need to know what groups were used for that instance of SQL Server (you are following good security practices and using a unique group for each instance and each service within each instance to reduce your vulnerabilty across systems if one instance is hacked, right?) so that you can reuse that group on the new system. A disaster recovery scenario is another situation where you will want to know how to map the SIDs to the names. You may want to back up these keys.

915846 Best practices that you can use to set up domain groups and solutions to problems that may occur when you set up a domain group when you install a SQL Server 2005 failover cluster http://support.microsoft.com/default.aspx?scid=kb;EN-US;915846

October 29, 2009
Checklist for upgrading to a new version of SQL Server
Here is the basic process for a full version upgrade (such as 2000 to 2008) of the SQL Server engine. The steps will be different for RS or AS. I did not include all the details of variations such as replication, mirroring, log shipping, full-text search, or clustering. They are covered in the white paper in the first reference at the bottom.

Look for potential problems:

·         Run Upgrade Advisor from your new version of SQL Server and address any issues.

·         Run SQL Server 2008 Upgrade Assistant (I don’t have any details on this – it’s a 3^rd party tool).

·         If upgrading to SQL Server 2008: Run the System Configuration Checker

·         Verify you are not using any discontinued features.

·         Read up on the “breaking changes” section of BOL.

·         Test not only the application but also all maintenance activities including disaster recovery.

·         Look for any hints/options or plan guides used in the system and test removing them.

·         On a development or test system – uncover and address issues caused by the upgrade and/or change in hardware

o   Install the new version and load copies of the databases.

o   Look for potential errors, changes in functionality, and performance changes.

o   Make sure you cover typical, peak, and periodic (such as month end) activity. Consider tracing high visibility activity as well, such as the functionality used by an executive or vocal user.

o   The more your test system looks like your production box (hardware, OS, configuration, SQL data) the more accurate your testing will be.

o   Do a profiler/readtrace replay: Capture at least one profiler trace using the replay template on production. Then use the replay functionality of either the RML tools or Profiler itself to replay the same traces on test. Consider that data modifications may fail depending on the state of the backups compared to the state of the database at the time the traces were taken.

o   Have testers hit the test system under load: typical and peak.

o   Put a full application load on the system and test all functionality. Tools like Profiler replay, Visual Studio Database Edition, RML, sample code such as what you might find on codeplex, and 3rd party products can all help with this process.

o   Consider using a tool such as Visual Studio Database Edition to add more test data to the system to simulate future usage.

o   If your IO setup will changed and the test system will be similar to production, consider running SQLIO (performance) and/or SQLIOSim and its new parser (correctness).

·         Test the upgrade process itself on a test system.

·         If doing an upgrade in place make sure the old system is at a minimum supported version level and edition for the upgrade (including the service pack). You cannot upgrade in-place from 32bit to 64bit. Check that your combination of old and new versions, editions, and platforms is supported.

Planning:

·         Start a document/checklist with all the steps you plan to follow, who is responsible for each, and the expected time for the process.

·         Will you change the OS, hardware, firmware, disk configuration, and SQL Server version all at the same time? If so, you can reduce your overall downtime but you also make it harder to isolate problems that occur after the changes.

·         Decide on a physical implementation vs. a virtual machine. If you will use a virtual machine make sure it is fully supported by Microsoft and it is setup per virtualization best practices.

·         If you are moving to a new box:
- - Follow best practices for configuration – if your memory, number of CPUs, or number of instances on the box/cluster changed some configuration options will need to be adjusted.
- - Run SQLIO (performance) and SQLIOSim (IO integrity) and verify the system is “fast enough”
- - Usually you will want the drives laid out the same, if not plan for the steps to change the location: 314546 HOW TO: Move Databases Between Computers That Are Running SQL Server http://support.microsoft.com/?id=314546
·         Do you have special functionality such as replication, mirroring, or clustering? If so, consider whether that changes any of the steps.

·         Will your upgrade be in-place, side-by-side on the same server, or side-by-side on a different server?

·         Will the new instance have exactly the same server name and instance name (i.e. Server1 or VirtualServer1InstanceA)? If not, how will you accommodate components such as replication and mirroring that assume the server name never changes?

·         Note that all instance-aware components within a given instance are upgraded together. Common components such as SSIS and the client tools are upgraded the first time a newer version is installed on a box.

·         Have a rollback/contingency plan that includes SLAs, how you will know when you must implement the contingency plan, who will make the decisions, etc.

·         What version will you upgrade to? What service pack/CU level will you be at?

·         Will you upgrade/move all user databases at once?

·         Will you move system databases and/or information from the system databases? This could include logins/passwords, linked server definitions, configuration settings, jobs, DTS/SSIS packages, etc. Note that system databases can only be restored to the EXACT same version. User databases can be restored/attached to a newer version but not back down to a lower major version.

·         Will you do a detach/attach or a backup/restore? Attach/detach is the simplest but generally has a longer user downtime. With backup/restore you can do a full (and possibly some incremental transaction log or differential) backup/restore before-hand and only have downtime for the latest transaction log backup/restore time period. This only works if the database is in full or bulk-logged recovery mode (and for bulk-logged you need to be sure you aren’t doing any bulk operations before the downtime).

·         How long do you expect the upgrade to take including time for potential troubleshooting, breaks, and interruptions? Add more time to that for the scheduled downtime.

·         Make sure you have enough disk space for the entire upgrade process including extra copies of your backups.

Pre-downtime steps

·         If upgrading to SQL Server 2008, consider applying .NET 3.5 SP1 and the latest Windows installer ahead of time as they require a reboot. You may also want to install the SQL Native Client ahead of time.

·         If you have a version of Visual Studio 2008 that is not patched to at least SP1, patch it to at least SP1 prior to installing SQL Server.

·         For SQL Server 2008 if you are not planning to install SP1, install RTM based CU4 (to fix known and resolved setup issues), then apply RTM, then reapply CU4. Or you can slipstream SP1+SP1 based CUx (or whatever the latest SP is at the time) and do a single install that includes RTM, SP1, CUx. For clusters, do each step across all nodes before going to the next step.

·         Take a baseline of the system and note the circumstances (busy vs. slow day, certain time of day/week/month/year, anything unusual).

·         If moving to a new server, install drivers, supporting applications/utilities, etc. on the new system.

·         Make sure OS level settings such as whether or not “Lock Pages in Memory” is enabled for the SQL Server account are the same on the old and new hardware.

·         If necessary, install the latest backward compatibility files on the SQL Server box and any client boxes (which may include other servers).

·         If moving from DTS to SSIS, consider upgrading the packages ahead of time. If you do this make sure no changes are made to the existing production DTS packages unless you also apply the changes to the converted SSIS packages.

·         Install the new version of the client tools (including service packs), samples, and sample databases on DBA and developer desktops.

·         Become familiar with the new interfaces, functionality, and how your features work in the new version.

·         Understand the upgrade process and how to troubleshoot failures in each step.

·         Test security and connectivity, such as whether linked servers work as expected.

·         Consider pre-growing system databases if doing an in-place upgrade.

Just prior to the upgrade:

·         Backup everything (system dbs, user dbs, registry, sys db settings, config options, encryption keys, etc) and save off the backups.

·         Run DBCC CHECKDB on all system and user databases – address any errors that are found.

·         If you are scripting out data from the system databases make sure you have everything you need.

·         If you are keeping the same server name on new hardware, make sure the old box is shut down and the new server and SQL Server (sp_dropserver + sp_addserver with the “local” parameter) have been changed to the old server name BEFORE you enable replication or anything else which relies on the server/instance name.

·         If doing a backup/restore on a full or bulk-logged recovery mode database, take the full backup and restore it to the new instance. Optionally you may have one or more tran and/or differential backups to restore as well.

At Downtime:

·         Document the process, such as actual time to complete each step, any changes you had to make, etc.

·         Make sure all users are out of the system and do whatever is necessary to keep them out until the upgrade is complete.

·         Disable startup processes, SQL Server trace flags, services that access SQL Server, etc.

·         Make sure all replicated data has hardened on all systems and stop replication.

·         Stop any non-essential services and applications, especially any that might have SQL Server related files open. Examples of common problem areas are IIS, backup software, virus/malware scanners, and applications that connect to SQL Server.

·         For an upgrade in place, start the upgrade.

·         For backup/restore, take the final transactional log backup, copy it to the new location (possibly using a SAN clone) and restore it to the new instance.

·         For attach/detach, detach the user databases from the original server, copy them to the new location (or use a SAN clone), and attach them.

·         Take new backups (after upgrade, before any changes) and save them off (where they will not be overwritten until you are ready for them to be).

·         If the system databases were not restored, copy their data and then sync the logins with the users.

·         If the system databases were restored for a pure migration as opposed to an upgrade, tweak configuration options as needed (for example, MAXDOP or the number of TempDB files may change if the number of cores has changed).

·         If you have DTS/SSIS packages or other supporting files stored in the file system, copy them to the new system and disable access to them on the old system.

·         Update statistics (essential for 2000 to 2005/8, nice for 2005 to 2008) on all user databases.

·         Set the compatibility mode to the current version for each user database (by default an attach or restore leaves it at the level of the source instance).

·         Take new backups (after changes) and save them off (where they will not be overwritten until you are ready for them to be).

·         If using Full-Text update your catalogs.

·         Perform another DBCC CHECKDB on each database – optional. If possible use the WITH DATA_PURITY option (this option is implied for databases created in 2005 or later and for any database that has once had that option explicitly stated).

·         Perform DBCC UPDATEUSAGE on each database.

·         Take new backups (after changes) and save them off (where they will not be overwritten until you are ready for them to be).

·         If the instance name changed, point the users/application(s) to the new instance name.

·         Test basic maintenance operations.

·         Test whatever application level items you can do yourself (perhaps with a replay of only SELECT statements) or test with a few trusted users.

·         Let the users back in.

·         Test the application(s) – have all users do at least basic operations.

·         Compare the current performance to your baseline, taking into account the circumstances such as the number of users on the system.

·         Evaluate the criteria to see if you need to apply your rollback/contingency plan.

·         Make sure your new backup strategy is implemented and takes into account the backups you took during the upgrade process.

·         Restart any services, applications, etc. stopped just for the upgrade process.

Later:

·         If you enabled autogrow for system databases, consider disabling it.

·         Re-enable startup processes, SQL Server trace flags, services that access SQL Server, etc.

·         Restart any replication.

·         Update your upgrade checklist/documentation based on your most recent experience. Many of the changes may also apply to your disaster recovery docs.

·         Review your documentation, best practices, troubleshooting scripts, profiler templates, etc. for changes that should be made. For example, in 2005 and later the process to rebuild master and the other system databases uses a different utility than 2000 did.

·         Consider turning on new options available in your new version such as database checksum instead of just torn page detection.

·         Take advantage of the more granular security options in the new versions. This includes better job/replication/SSIS security as well as more role options.

·         With 2008 and later set up a Central Management Server so everyone in a group (such as all production DBAs who support one sector of the business) has the same list of SQL Servers available to them in SSMS.

·         Start using new features/functionality in the new version.

·         Keep up to date with service packs and cumulative updates.

·         Rewrite anything using deprecated features. You can use SQL Profiler to capture an event every time a deprecated feature from the SQL engine is used. Starting with SQL Server 2008 there is also a perfmon counter for this.

·         Capture a new performance baseline/benchmark on the system and update the baseline periodically.

·         For a side-by-side upgrade, determine when and how you will decommission the old instance.

·         Migrate maintenance plans created in 2000 to the new SSIS format.

References:

·         Ultimate guide to upgrading to SQL Server 2008 http://sqlcat.com/msdnmirror/archive/2008/12/07/ultimate-guide-to-upgrading-to-sql-server-2008.aspx

·         The application may have additional steps when you upgrade or move a database. For Sharepoint those steps are outlined here: http://mickey4nay.spaces.live.com/blog/cns!8983677073B13695!1484.entry

·         Upgrading a SQL Server 2008 Failover Cluster

If you are using replication, Analysis Services (AS), or DTS/SSIS, see these links:

Considerations for Upgrading Replicated Databases

Considerations for Upgrading Analysis Services
Considerations for Upgrading Integration Services
Considerations for Upgrading Data Transformation Services
October 28, 2009
How People Abuse Their SQL Server Transaction Log – Things NOT to Do

Care and Feeding of the Transaction Log

Want to learn more about how the transaction log works? Kalen Delaney, celebrated author of SQL Server 2008 Internals and the Inside SQL Server series as well as a world class trainer and performance tuner, is giving a pre-conference talk at SQL PASS next week. The pre and post conference sessions are full day sessions you can purchase as an add-on to the regular conference (which runs Tuesday-Thursday). Kalen’s pre-conference session “Care and Feeding of the Transaction Log” is scheduled for Monday, November 2, 2009. She will be raffling off copies of her book and will have copies of her DVD to give away. If you are guilty of any of the transaction log practices below, or have colleagues who are and you need ammunition in your fight for change, Kalen’s session will be a great help to you! Note that she will not necessarily be covering all the items below, they are my own list and not hers. 🙂

How People Abuse Their SQL Server Transaction Log – Things NOT to Do

·         Rely on autogrow – this causes fragmentation, poorly allocated VLFs, a performance hit for duration of grow, etc.

·         Don’t take backups at all or do them on a schedule that doesn’t meet SLAs or doesn’t keep the log “small enough”.

·         Use simple recovery mode in the belief it will improve performance when there are no bulk inserts or something else that actually benefits from minimal logging.

·         Want to “turn off logging”.

·         “Shrink” the log by deleting the LDF file – can result in corruption and an unusable database.

·         Shrink the log file when it will just grow again.

·         Add multiple LDF files due to the mistaken impression that this will improve performance (multiple threads myth or thinking it will split IO over multiple files when logs are mostly sequential). This is particularly a problem with TempDB where we recommend 1/4 to 1 file per core and don’t always make it clear we mean only the data files and not the log file.

·         Only backing up the log files when they reach a certain percentage full to reduce the number of backups occurring at any one time – this can result in not being able to meet your recovery SLAs.

·         Enable instant file initialization and expect it to help with log growth.

·         Create one extremely large log file without regard to # of VLFs created.

·         Sharepoint documents have recommended simple recovery mode to “keep the log from filling” but don’t explain the tradeoffs for recoverability such as losing Point in Time recovery options.

·         Don’t understand the difference between truncating and shrinking the log.

·         Don’t realize that in simple mode you still have to do a full backup first.

·         Don’t realize they need to do a log backup after switching from simple or bulk logged to full.

·         Think DBCC CHECKDB checks for inconsistencies in the log

·         Put data and log on same drive (mix random and mostly sequential), put lots of logs on one drive (defeats purpose of having a log/sequential writes on own IO path), put output of profiler or other activity on same drive as a log

·         Don’t put log on fast enough/properly configured drive (RAID 10, disk partition alignment, separate from file server/other SQL/db data, disk allocation unit = 64k, HBA queue depth 64+, anything that gets writes to 3ms or less, etc.)

·         Ignore corruption messages/event log messages about bad IO

·         Don’t secure the directories where the MDF/LDF and backup files reside

·         Ignore the fact that Tran replication affects log size (can’t truncate until log reader has read data from log)

·         Do many transaction log backups between full backups (with few or no differentials) and don’t consider what happens if one of the files is corrupted/missing or how long it will take to do the restores

·         Don’t test the DR strategy, or at least have it documented and planned/thought out

·         Don’t exclude the MDF/LDF files from virus scanning software

Instead of abusing your transaction log, pay attention to Pond’s Twelfth Law: Don’t practice in front of the CIO. A professional prepares ahead of time. To help you prepare, attend Kalen’s pre-conference session!

October 28, 2009
Considerations for installing SQL Server on VMWare

You still need to follow the standard SQL Server best practices, security, and preparation when you install on a VMware VM. Below are some additional considerations with regards to VMware.

Key points:

·         You must double check that your combination of VMWare, Windows, and SQL will be supported.

o   By Microsoft: http://windowsservercatalog.com/svvp.aspx?svvppage=svvpwizard.htm, http://support.microsoft.com/?id=956893

o   By VMware: http://pubs.vmware.com/vsp40/upgrade/wwhelp/wwhimpl/common/html/wwhelp.htm#href=c_database_scenarios.html&single=true

·         Just as with a physical system you must provide a well-configured IO subsystem. Follow standard SQL Server IO best practices. Use separate VMware VMFS partitions on independent disk arrays.

·         Again, just like with a physical machine, you must use hardware with sufficient resources.

·         Do not overcommit resources to the virtual machine (VM). For example, if the physical box has 4 cores and is running 4 VMs, do not assign 2 cores to each VM as this would overcommit the cores. The same applies to memory and disk.

·         VMWare recommends using ESX Server for production virtual machines if you want the best performance.

·         ” If SQL Server’s “lock pages in memory” parameter has been set, provide set the VM’s reservations to the amount of memory in the VM. This setting can adversely interfere with ESX Server’s balloon driver. Setting reservations will stop the balloon driver from inflating into the VM’s memory space.”

·         Consider configuring “large pages” for some types of workloads.

References from VMware:

·         Best Practices for SQL Server http://communities.vmware.com/docs/DOC-8964

·         Performance and Scalability of Microsoft SQL Server on VMware vSphere 4 http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf

·         Database Scenarios http://pubs.vmware.com/vsp40/upgrade/wwhelp/wwhimpl/common/html/wwhelp.htm#href=c_database_scenarios.html&single=true

·         VirtualCenter Database Performance for Microsoft SQL Server 2005 http://www.vmware.com/resources/techresources/1050

·         SQL Server Workload Consolidation (VMware ESX 3.5) http://www.vmware.com/pdf/SQL_Server_consolidation.pdf

·         SQL Server Performance in a WMware Infrastructure 3 Environment http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf

October 23, 2009