Befriending Dragons

Intersectional Coaching Journeys: Reframe, Unblock, Move Forward


SQLPASS: Are You Smarter Than an MCM? VLF Demos

Are You Smarter Than an MCM?

 I had a great time on Wednesday co-presenting with some great SQL peeps! We dressed up, had a lot of fun, and shared a few technical tips along the way. My demo was on VLFs, an often forgotten and/or misunderstood part of the transaction log technology. The demo files are attached below. Thanks to everyone who came to the talk! #SQLPASS

The Team for DBA-414-M

Our fabulous “Smarter” team is


Question: Cindy has ADHD.  In the past she we able to keep up with multiple tasks at once, but lately she has slowed down.  Why? 

Answer: Excessive VLFs

 A VLF is the boundary within your log file(s) for log operations such as scans (replication, recovery, mirroring, CDC, etc.) and log clearing/truncating.

  • 100s of VLFs might be ok, 1000s is probably bad, 10s of thousands is almost certainly bad. Bad means much slower operations for activities that operate within VLFs boundaries.
  • DBCC LOGINFO is undocumented and unsupported but is the only way to see your VLF distribution. It returns one row per VLF.
  • Often you get too many VLFs from autogrow/small growths.

 VLF Lessons Learned:

  • Avoid autogrow by pre-sizing your data and log files appropriately. This includes alerting when your free space starts to decrease significantly and planning for data growth over the lifecycle of the system. Make sure you check your system as well as your user databases.
  • Set autogrow values to reasonably large amounts to avoid a series of small autogrows in an unexpectedly high volume period, but not so big the growths themselves slow down the system.
  • Alert on autogrows and check (and possibly correct) VLFs afterwards.
  • Check your VLFs periodically and during performance troubleshooting and consider shrinking/regrowing properly (with the proper presizing) to reduce the total number of VLFs when the number gets “too high”.
  • Never grow (manually or via autogrow settings) in increments of exactly 4GB. Values such as 4000MB or 8000MB are good, but there is a bug with multiples of exactly 4GB.
  • Do NOT shrink your data or log files on a regular basis. Shrinking should be extremely rare and in response to a specific issue.
  • And because it can’t be said too often, pre-size your databases to avoid autogrow.
  • Revisit the sizes periodically.

 Thanks everyone for coming to the talk and for the great feedback afterwards! You did fill out your evaluation forms, right? 🙂


Leave a comment

SQL PASS: All the Magic Knobs


All the Magic Knobs – Low Effort, High Return Tuning

Key points covered:

  • Power Savings = High Performance
  • Smart Virtualization
  • Enough Hardware
  • Control other apps, filter drivers
  • Optimize for ad hoc workloads = ON
  • Compression = ON
  • Set LPIM + Max Server Memory
  • Pre-size files, avoid shrink and autogrow
  • Fast Tempdb
  • Proper Maintenance

My presentation from 10/13/11 is attached.



24HOP: SQL Server Performance Tools

Thanks to everyone who attended my session today! I think the 740 or so of you overloaded the audio. 🙂 Sorry everything wasn’t clear on the voice delivery, hopefully the slides (below) provide enough information for you to know what to download. Then you can load your own SQLDiag data into SQL Nexus and see all the stuff I showed in the demos! Remember that I was showing you the “near future” version 4.0 of SQL Nexus, what you can download today from CodePlex is version 3.0. I’ll tweet when the new version is released.


There are still many sessions left in this spring 2011 24 Hours of PASS, attend as many as you can today and tomorrow! Look for the recordings to be available soon.


My slides are below, the abstract is: SQL Server Performance Tools: Troubleshooting performance issues with SQL Server can be confusing and time consuming. I want to make sure you are familiar with some of the common tools used to help troubleshoot performance issues and when to use each. I will provide best practices around troubleshooting methodology, SQLDiag, PSSDiag, SQL Nexus, Profiler / SQL Trace, and PerfMon. After this presentation you will be better able to narrow down a performance problem and focus in on where to spend your time and effort.

Please: Complete Session Evaluation


Some of the key references:


·          SQL Server 2008 Internals and Troubleshooting – Chapter 8 Defining Your Approach to Troubleshooting

·          PerfStats – Download

·          SQL Nexus on Codeplex – Download

·          RML/Readtrace – Download

·          SQLDiag – Parameters/Usage

·          Waits and Queues – White Paper

·          New Opportunities to be a SQL Master – Changes to the program



1 Comment

Power Saving Options on SQL Server

Power Saving Options on SQL Server


Windows 2008 and Windows 2008 R2 have settings for “Power Options”. Windows 2008 R2 has additional power improvements related to “core parking” (temporarily suspending certain cores) and “tick skipping” (extended idle and sleep states). The default power setting for Windows 2008 and 2008 R2 is “Balanced Power” which means that when the system isn’t “busy” various components, such as the CPU and storage, are throttled back. This only happens when the box has CPUs and a BIOS that supports the ACPI specification for power that allows these options to take effect. Newer model CPUs are more likely to support this specification. Database systems that don’t have high sustained CPU usage are probably going to perform better with “High Performance”.  Most people size their SQL Server boxes so that the CPUs run at a fairly lower percentage of total available so it’s pretty easy to drop down to the slower CPU speeds and get slow, inconsistent performance. There are many documented cases of “Balanced Power” hampering SQL Server performance.  For a SQL Server box if you want consistent, predictable, high performance you either need to fine tune the power setting parameters for each individual workload and/or for different times of day or just set the power options to high performance. If power savings are more important than fast, predictable performance then you can leave it at “Balanced Power”. As a side note that isn’t directly related to SQL Server but may affect virtualized SQL Servers and/or application servers that connect to SQL Server, VMWare has a doc on the need for “High Performance” power settings on the VM host.



·         SQL Server on Power-Saving CPUs? Not So Fast.

·         Are your CPUs running slowly? Tool tip and survey

·         Windows OS Power Saving Feature Directly Affects CPU Performance



·         Poor virtual machine application performance may be caused by processor power management settings



·         Degraded overall performance on Windows Server 2008 R2
applications or scripts that intensively use resources (primarily processor and memory) may exhibit the problem”

·         Improve Energy Efficiency and Manage Power Consumption with Windows Server 2008 R2
“The “High Performance” mode may be appropriate for servers that run at very high utilization and need to provide maximum performance, regardless of power cost.”

·         Performance Tuning Guidelines for Windows Server 2008

“However, Balanced might not be appropriate for all customers. For example, some applications require very low response times or very high throughput at high load. Other applications might have sensitive timing or synchronization requirements that cannot tolerate changes in processor clock frequency. In such cases, changing the power plan to High Performance might help you to achieve your business goals. Note that the power consumption and operating cost of your server might increase significantly if you select the High Performance plan.”

To view/set

·         Powercfg.cpl

·         Powercfg.exe – Configure Power Plans

·         Ensuring Maximum CPU Performance via Group Policy
PBM Policy / WMI query to see which CPUs are in power-saving mode

·         From Powershell: gwmi -class win32_Processor | SELECT CurrentClockSpeed, MaxClockSpeed


The ins and outs of MAXDOP

Configuring MAXDOP


One of the few sp_configure options that good DBAs regularly change is “max degree of parallelism” or MAXDOP. So of course everyone is looking for the one best value to use for that setting. As with most things in the database world, the answer is “it depends.” I’m going to start with some background to help you understand some of the variables that affect how you should set that value on any given instance of SQL Server.

What is parallelism?

For the purposes of this discussion we can think of SQL Server schedulers as 1 visible, non-DAC scheduler per logical processor as seen by the operating system and made available to SQL Server. The terms processor, core, CPU, thread, and scheduler are often interchanged (sometimes correctly, sometimes not) throughout various documents and not all have just one meaning. I’ll try to stick to the SQL scheduler terminology to avoid confusion with logical/physical processors. For this discussion today I will assume you have not turned on CPU affinity options, you are not using WSRM, and your login is not limited by MAX_DOP settings in a Resource Governor resource pool.


As an example, say you have 4 quad cores with hyperthreading on. That comes to 4x4x2=32 visible schedulers available for non-DAC SQL Server queries.


You can see the number of visible schedulers with this query:

SELECT * FROM sys.dm_os_schedulers

How does a query use parallelism?

When a query is compiled it is broken down into steps/operations internally. When the query runs, each step of the query plan that is eligible to go parallel can be optimized as using 1 or more schedulers. So if the plan has 10 steps, maybe step 1 gets 1 scheduler, step 2 gets 12 schedulers, step 3 gets 4 schedulers, step 4 gets 20 schedulers, etc. The total for the query can add up to more than the MAXDOP because the MAXDOP applies per step. The number of schedulers used per step is not saved in the query plan; it is chosen each time based on availability. For any step that uses more than one processor SQL will wait on CXPackets as it synchronizes the activity across schedulers (exchanges data across threads) for that step. As with other waits in SQL Server, a wait on CXPacket in and of itself is not a problem but rather a normal and expected part of how the system does its work. If the waits become excessive, then you can start looking at parallelism as a potential problem.


You can set the “max degree of parallelism” for the entire instance. This means that any given step in any given query will not be assigned more schedulers than the value set in that configuration option. The setting can be overridden (higher or lower) for individual TSQL queries with the OPTION MAXDOP query hint. Some DDL operations such as ALTER INDEX have MAXDOP hints available as well.


So say a query executes and one or more of its steps “go parallel”. This means that the query optimizer determines the query will finish faster overall when it uses parallel threads for one or more steps, even though it uses more CPU to do so. Partly it’s that the CPU time is more concentrated as more threads are active at once and partly that there is extra overhead in managing the context switches (CXPacket). So when you see CXPackets in use and even when you see waits on CXPackets it can be normal and expected.

Is parallelism good?

Parallelism is very useful when you have a small number of very long running queries. Each query executes faster in parallel and since there are few queries at any given point in time there is likely little blocking or competition for resources. However, if you have many small, fast queries few will “go parallel” because the overhead of managing the data exchanges between threads is greater than the gain from executing in parallel. When you start mixing the types of queries it is harder for the optimizer to choose the right parallelism levels for the overall workload. One slow query may start at a time when things are relatively idle and it may use many threads. Then several small, fast queries come in while the first is still running and the CXPacket waits (number and duration) start to increase making all of the queries go slower. SQL’s CPU usage may go up significantly.


Another problem can arise when the statistics are out of date or very skewed. That means that the optimizer may choose a plan based on the expected number of rows (cardinality estimate) that doesn’t quite balance out as expected because the actual number of rows is so far off. In that case updating statistics (perhaps WITH FULLSCAN if the data is skewed) may help. In many cases adding or altering indexes can help as well since quite often fewer rows are touched when sufficient indexes exist and therefore the need for parallel data access decreases.


On a system where you expect all of the queries to be short/fast, setting the “max degree of parallelism” to 1 (never go parallel) basically gives priority to those short/fast queries and any big/slow queries just take longer to execute but don’t hold up other things as much. Leaving it at 0 (use any #) helps on systems where there are only a few queries running but all are slow/expensive. This is because SQL Server has the most flexibility with a setting of 0 and can choose to use as many schedulers as needed. Anything in between 1 and 0 is an attempt to manually balance those things if you feel SQL isn’t doing it the way you want it to. The only way to really know for sure what the best setting is on your system is to test average and peak workloads with different “max degree of parallelism” settings. You should keep in mind that as the data, number of users, other activity, etc. changes the “best” value may change as well.


What value do I use for “max degree of parallelism”

Now we make it to the question of what value to use for “max degree of parallelism” on your particular instance. Keep in mind that these are general guidelines for OLTP systems. Warehouse/OLAP loads, edge cases, huge systems, many SQL instances, odd configurations, unusual workloads and other factors can outweigh the reasoning behind some of these recommendations. If you’re using hyper-threading, then that is really already doing context switches of a sort and you basically have two SQL schedulers mapped to one physical core. That can cause problems in the balancing equation, so if you have hyper-threading enabled you need to make sure “max degree of parallelism” is not 0 and is not greater than half the number of visible schedulers. If you use CPU affinity mask or WSRM to limit the number of cores available to SQL Server, then your “max degree of parallelism” should be no more than the number of cores available to that instance of SQL Server (if necessary divided by half for hyper-threading). If you are using hard NUMA, keep the value no larger than the number of cores in a NUMA node as this helps reduce costly foreign memory access. As a starting point for an average workload you probably don’t want to use a value greater than 8, though this is not a hard limit but rather a starting point/guideline. Start with the smallest value out of all the rules above. You may need to adjust the value based on your testing and/or as the workload and other factors change over time.

Do I have a parallelism problem?

Next there is the question of how to know if the current “max degree of parallelism” is contributing substantially to a current performance or resource constraint problem. If your Process.SQLServr.%Processor Time/# of logical processors value is much higher than normal/expected AND wait stats shows more than 5% of your waits are on CXPackets, you may want to test lower (or non-zero) values of “max degree of parallelism”. You find connections that have currently running parallel queries by looking for SPIDs with multiple ECIDs in sys.sysprocesses. Query plans that are eligible for parallelism will have the Parallelism operator. If you decide your current degree of parallelism might be negatively impacting performance, use the information above to estimate what may be a good starting point for your particular instance and test/baseline/tune from there. No restart is required after you change the value; new queries automatically use the new value (unless overridden by a hint in the query).



·         Case Study: Part 1: CXPACKET Wait Stats & ‘max degree of parallelism’ Option: Introduction to Using Wait Stats to Identify & Remediate Query Parallelism Bottlenecks //

·         SQL Server 2005 Waits and Queues

·         Recommendations and Guidelines for ‘max degree of parallelism’ configuration option

·         Problem fixed in SQL Server 2008 SP1 CU3 and higher: 970399     FIX: The MAXDOP option for a running query or the max degree of parallelism option for the sp_configure stored procedure does not work in SQL Server 2008;EN-US;970399

·         SQL University: Parallelism Week – Introduction

·         Maximum Number of Processors Supported by the Editions of SQL Server

·         Windows 2008 R2 – Groups, Processors, Sockets, Cores Threads, NUMA nodes what is all this?

·         Why Do I see more threads per SPID in sysprocesses than MAXDOP?


Note: This information was verified for SQL Server 2008 R2 and with very minor variations will apply also to SQL Server 2008 and SQL Server 2005.



Online learning with Kalen Delaney – SQL Server 2008 Indexes – Internals and Best Practices

Kalen always does a great job of explaining the guts of SQL Server. She has a deep, intuitive grasp of index internals. Even better, she can take that knowledge and share it so everyone can learn something, whether you’re a relative newbie or a veteran. This particular workshop is a deep-dive with advanced topics so if you have a good grasp of the basics and want to hone your index skills, sign up now for Friday’s online workshop for only $99 and learn all about index internals and best practices. You can even get a DVD as part of the registration (but it costs extra).

Workshop: SQL Server 2008 Indexes – Internals and Best Practices or

Date: Friday, July 23, 2010 at 9:00a Pacific

And learn more about Kalen Delaney and her SQL Server 2008 Internals book here:

Leave a comment

Windows storport enhancement to help troubleshoot IO issues

For Windows 2008 and Windows 2008 R2 you can download a Windows storport enhancement (packaged as a hotfix). This enhancement can lead to faster root cause analysis for slow IO issues. Once you apply this Windows hotfix you can use Event Tracing for Windows (ETW) via perfmon or xperf to capture more detailed IO information that you can share with your storage team. This could be very useful for your SQL Server installations or any system where you suspect IO issues.


Enhancement information and background topics:

·         Storport ETW Logging to Measure Requests Made to a Disk Unit

·         Diagnose Slow I/O on Windows With Xperf (from the MSFT guys who wrote it – based on a pre-release version but the basics still apply)

·         xPerf: Once I Was Blind and Now I See

·         Storport Driver

The hotfix downloads are available from the top of each article and instructions are included in the body of each KB:

·         KB 979764 (Windows Server 2008 hotfix)

·         KB 978000 (Windows Server 2008 R2 hotfix)

· (xperf downloads)

1 Comment

SQL Server Performance Tools – Boise Code Camp Presentation

Today I am presenting about SQL Server Performance Tools at the Boise Code Camp. You can download the slides and supporting files here on this blog (at the bottom it says Attachment(s): ). The basic agenda of items covered is:


¢  Methodology

¢  SQLDiag

¢  PSSDiag

¢  SQLNexus

¢  Profiler

¢  PerfMon

¢  References

The perfstats script I discussed can be found at:
And the perfstats analysis tools are at:

Leave a comment

How and Why to Enable Instant File Initialization

See my new blog post (written with Denzil Ribeiro) about “How and Why to Enable Instant File Initialization” on our PFE blog. Keep an eye on the PFE blog for more posts from my team in the near future.

Leave a comment

SQL Server and Fragmentation

There are several types of fragmentation that affect SQL Server, all of which have the potential to cause performance problems.


1)      At the disk/OS level individual files can become fragmented due to growth over time. This can be addressed by a defragmentation tool at the SAN or Windows level. Ask your storage team about your options there.

2)      Within a SQL Server database individual pages can become internally fragmented so that many pages are less full than expected. This means more pages are needed to store the same amount of data which means more reads to satisfy each query and less total data that can be kept in memory. This can decrease performance.

3)      Within a SQL Server database the index pages and extents can become less sequential. This can make scans slower. Though hopefully you are doing more seeks than scans this can still be an important factor in performance.


If you are using Sharepoint, you need to be aware that you cannot change the schema at all, including adding or modifying indexes. Here is some information on Sharepoint support:

·         943345  How to defragment Windows SharePoint Services 3.0 databases and SharePoint Server 2007 databases;EN-US;943345

·         841057  Support for changes to the databases that are used by Office server products and by Windows SharePoint Services;EN-US;841057

·         968785  After you install Windows SharePoint Services 3.0 Service Pack 2, the Windows SharePoint Services 3.0 database can be auto-defragmented;EN-US;968785

·         Note that with Sharepoint, the recommended break-point where you change from REORGANIZE to REBUILD is 70% instead of the more standard 30%. Your maintenance window may be a bit longer but concurrency will increase for your users on the tables with BLOBs.

Disk/OS level fragmentation:

In a SAN environment, sometimes you cannot use the windows tools to tell you how fragmented a volume is since some characteristics may be hidden from Windows and known only to the SAN. If your storage team has SAN tools to tell them about fragmentation, then use that. Windows can see a SAN volume as fragmented when the SAN has already taken care of the fragmentation.

·         This SQL IO basics document has some details on disk defragmentation

Database level fragmentation:

·         sys.dm_db_index_physical_stats (Transact-SQL)

·         Reorganizing and Rebuilding Indexes

·         Guidelines for Performing Online Index Operations

·         Rebuilding Indexes and Updating Statistics


When you REORGANIZE (in older versions of SQL Server this was DBCC INDEXDEFRAG) it compares two pages at a time and works its way through the index. If it is killed before it stops no work done is lost. However, for a very defragmented index this operation will take much longer than REBUILD. The more fragmentation you have the longer this operation takes to complete. There may still be some fragmentation left after this operation completes.


When you REBUILD (DBCC DBREINDEX in earlier versions) a copy is made of the index and the new version is entirely sequential and compact. It is faster to use this method when the fragmentation level is high. REBUILD also does the equivalent of an UPDATE STATISTICS WITH FULLSCAN which gives very accurate statistics. The amount of time to complete has little to do with the amount of fragmentation and instead depends on the total amount of data, the data types, the index structure, etc. Internally this is pretty much equivalent to creating a new index (without the overhead of a sort) then dropping the old one within a transaction. For some indexes it is possible to do a minimally logged operation if you are not using the FULL recovery mode. Not using FULL recovery mode means you have a greater exposure to possible data loss in the case of certain types of system failures, this is not directly related to the REBUILD operation but only to the recovery mode. Extra precautions such as a log backup immediately before and after the change of recovery mode. For non-clustered indexes you can sometimes use the SORT_IN_TEMPDB option to avoid a sort. For tables with no XML or spatial indexes you may be able to set ONLINE=ON in Enterprise Edition so that users can continue to access the index while it is being rebuilt (there are several restrictions on when this is allowed). If there is little free space in the database the new index may not be completely defragmented.


The FILLFACTOR setting is related to fragmentation and may be changed during a REBUILD. You can set the FILLFACTOR to a higher or lower number to adjust how much free space is left on each page. The tradeoff is in space used vs. the cost of future page splits as data is added or rows get bigger (variable length or nullable columns get more data). The FILLFACTOR does not prevent a page from filling, it sets the amount of free space only at the time the index is created or rebuilt. By default the pages are created as full as possible so unless you have explicitly set your FILLFACTOR to a lower amount for one or more indexes then you won’t be able to save space that way. If you have to defragment an index more often than you would like, and the fragmentation is due mostly to page splits, then you might want to set a lower FILLFACTOR. If you rarely have to do defragmentation and you have a lot of empty space per page, you might want to increase the FILLFACTOR. If you want to guess at a “best” value on each index you need to know the average width of the rows, the clustered index key values (if you’re setting it for a non-clustered index), the expected amount of inserts and whether they will cause a lot of page splits, are inserts on a monotonically increasing key like IDENTITY, whether updates will cause a lot of row movement or changes in row size, whether deletes are likely to happen in random spots or at the “beginning” or “end”, the volume of data modifications compared to total rows and whether the data is compressed. All of which lead back to how much fragmentation do you expect (and does that fragmentation cause any perceivable performance problems) and how often can/will you defragment the indexes. So if you see a lot of fragmentation and when you defragment your performance improves (not always the case), have a lot of page splits (PerfMon: SQL Server Access Methods.Page Splits/sec), and have the space available to rebuild your index then a lower FILLFACTOR is one possible solution. When you rebuild the indexes it will often change the total space allocated to the table, partly due to FILLFACTOR and partly due to the fragmentation cleanup.



·         BOL Fill Factor

·         BOL Create Index

·         Database Maintenance Best Practices Part II – Setting FILLFACTOR

·         Who Cares about FillFactor?

·         Kimberly Tripp on FILLFACTOR

·         Top Tips for Effective Database Maintenance