Cindy Gross | Befriending Dragons

Category: SQL & Big Data

Save the SQL error logs….

Many people use sp_cycle_errorlog to rollover to a new SQL error log periodically. Please make sure this is necessary and/or that you are not doing it too frequently. The more often you roll over the log the greater the chance you will not have troubleshooting or historical information available, especially after a series of unexpected sequential restarts. If you are cycling the log, you probably want to keep more than the default number of 7 logs so you’re less likely to lose valuable troubleshooting information.

2199578 How to manage the SQL Server errorlog

http://support.microsoft.com/default.aspx?scid=kb;en-US;2199578

Configure SQL Server Error Logs (General Page)

http://msdn.microsoft.com/en-us/library/ms177285.aspx

“In Object Explorer, expand your server, expand Management, right-click SQLServerLogs, and then click Configure.“

March 29, 2011
24HOP: SQL Server Performance Tools

Thanks to everyone who attended my session today! I think the 740 or so of you overloaded the audio. 🙂 Sorry everything wasn’t clear on the voice delivery, hopefully the slides (below) provide enough information for you to know what to download. Then you can load your own SQLDiag data into SQL Nexus and see all the stuff I showed in the demos! Remember that I was showing you the “near future” version 4.0 of SQL Nexus, what you can download today from CodePlex is version 3.0. I’ll tweet when the new version is released.

There are still many sessions left in this spring 2011 24 Hours of PASS, attend as many as you can today and tomorrow! Look for the recordings to be available soon.

My slides are below, the abstract is: SQL Server Performance Tools: Troubleshooting performance issues with SQL Server can be confusing and time consuming. I want to make sure you are familiar with some of the common tools used to help troubleshoot performance issues and when to use each. I will provide best practices around troubleshooting methodology, SQLDiag, PSSDiag, SQL Nexus, Profiler / SQL Trace, and PerfMon. After this presentation you will be better able to narrow down a performance problem and focus in on where to spend your time and effort.

Please: Complete Session Evaluation

Some of the key references:

·          SQL Server 2008 Internals and Troubleshooting – Chapter 8 Defining Your Approach to Troubleshooting

·          PerfStats – Download

·          SQL Nexus on Codeplex – Download

·          RML/Readtrace – Download

·          SQLDiag – Parameters/Usage

·          Waits and Queues – White Paper

·          New Opportunities to be a SQL Master – Changes to the program

#24HOP

PerformanceTools_24HOP_CindyGross_Spring2011.pptx

March 15, 2011
General Hardware/OS/Network Guidelines for a SQL Box
I have put together some general guidelines for how you want a server to be delivered to the DBA team for a new SQL Server install. You won’t necessarily use all of them, but consider it a starting point for your SQL Server install standards. Places where it may be common to change the statements are in [red]. Always run the SQL Server Best Practices Analyzer or an equivalent tool to check for the latest recommendations before releasing the system to production. I’m sure some of you will disagree with some of the points for various reasons, but I’ve found them to be a good baseline for discussion and a great starting point for standards documents. I’m ok with that because I am very fond of saying “it depends”. 🙂

The below is specific to SQL Server 2008/200R2 on Windows 2008/2008R2.

OS Specifications (things often controlled by a Windows team)

·         Power saving features: For a SQL Server box if you want consistent, predictable, high performance you either need to fine tune the power setting parameters for each individual workload and/or for different times of day or just set the power options to high performance. Databases are harder to fit into the normal power saving profile so they don’t fit as well into the default power saving settings. [If your environment requires that you favor power savings over performance change this statement and be aware of the impact.]

·         You should double check that your virus scanner is certified for Windows 2008 R2. Older scanners use TDI and you need WFP models to work properly on the newer OSs. The older type of anti-virus scanners can cause serious IO problems.
981889 A Windows Filtering Platform (WFP) driver hotfix rollup package is available for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 http://support.microsoft.com/default.aspx?scid=kb;en-US;981889
979278 Using two Windows Filtering Platform (WFP) drivers causes a computer to crash when the computer is running Windows Vista, Windows 7, or Windows Server 2008 http://support.microsoft.com/default.aspx?scid=kb;EN-US;979278
979223 A nonpaged pool memory leak occurs when you use a WFP callout driver in Windows Vista, Windows 7, Windows Server 2008, or in Windows Server 2008 R2 http://support.microsoft.com/default.aspx?scid=kb;EN-US;979223
976759 WFP drivers may cause a failure to disconnect the RDP connection to a multiprocessor computer that is running Windows Vista, Windows Server 2008, windows 7 or Windows Server 2008 R2 http://support.microsoft.com/default.aspx?scid=kb;EN-US;976759
Windows Filtering Platform http://www.microsoft.com/whdc/device/network/WFP.mspx

·         Virus scanners and spyware detection should not scan SQL Server data and log files (usually mdf/ldf/ndf extensions) and other SQL related files because the scanning significantly degrades performance. [Note that this is a tradeoff with security and you must decide on performance vs. security based on your own security guidelines.]
REASON: Performance, smoother setup. See 309422 Guidelines for choosing antivirus software to run on the computers that are running SQL Server http://support.microsoft.com/default.aspx?scid=kb;EN-US;309422

·         Firmware, BIOS, network adapter drivers, storport drivers, etc. will be at their most recent, stable versions before the server is released to the DBAs.
REASON: There are common SQL Server performance, usability, and supportability problems caused by older firmware, BIOS, network adapter drivers, etc.

·         For Windows 2008 and Windows 2008 R2 you can download a Windows storport enhancement (packaged as a hotfix). This enhancement can lead to faster root cause analysis for slow IO issues. Once you apply this Windows hotfix you can use Event Tracing for Windows (ETW) via perfmon or xperf to capture more detailed IO information that you can share with your storage team. http://blogs.msdn.com/b/cindygross/archive/2010/07/13/windows-storport-enhancement-to-help-troubleshoot-io-issues.aspx

·         Do not install SQL Server on a domain controller (DC).
REASON: A busy DC can take resources away from SQL Server. There are also negative security implications from installing SQL Server on a DC.

·         Grant SE_MANAGE_VOLUME_NAME to the SQL Server group to allow instant file initialization of data (but not log) files. There is a small security risk associated with this but it can greatly improve the performance of CREATE/ALTER data (but not log) files. [Decide as a company whether this performance enhancement is worth the small risk] http://blogs.msdn.com/b/sql_pfe_blog/archive/2009/12/23/how-and-why-to-enable-instant-file-initialization.aspx

·         Critical updates for Windows will be tested and applied ASAP after their release.
REASON: Security that affects Windows often affects SQL Server as well.

·         Resource intensive screensavers will be disabled and replaced with low resource consumption security to lock the consoles.
REASON: Performance – Resource intensive screen savers can steal resources from SQL Server.

·         Files will be secured: All copies of the data and log files as well as all copies of the backup files will be secured with access given only to those documented in the SQL Server Disaster Recovery plan.
REASON: Data and log files can be copied and attached to another instance of SQL Server, thereby exposing the information to the sysadmins of the new instance. Therefore access to these files must be very limited. However enough access must be granted to allow for recovery.

·         EFS: SQL Server will not be installed on disk that is encrypted with EFS.
REASON: 922121 You may experience decreased performance in some features of SQL Server 2005 when you use EFS to encrypt database files http://support.microsoft.com/default.aspx?scid=kb;EN-US;922121

Storage Specifications (often configured by a Windows and/or storage team)

·         Battery backup must be enabled for all controllers or storage media which do write caching.
REASON: This is required by the WAL protocol to ensure stable media for SQL Server. See http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqliobasics.mspx#EYCAE

·         For SQL Server disks, performance is more important than conserving space. This means there may be what would be considered “wasted space” on a file server and that the overall cost per stored MB will be higher for a database system than for a file server. [This is a general guideline, if your environment prefers costs savings and space usage maximization over performance change this statement.]
REASON: High performance is generally a major requirement of a database system, and is much more important than on most file systems. Higher performance requires that disk be laid out, configured, and managed in particular ways.
- Disk alignment must be done to a multiple of 64KB. Some vendors may express a preference for a particular value, but most mainstream hardware vendors have agreed that 1024KB is acceptable. That is the default for Windows 2008+. If you use dynamic disks it is difficult to see the alignment from Windows.
  REASON: If the disk is not aligned, performance can suffer as much as 30-40% because some read/write activity may be to/from two blocks instead of one. See 929491 Disk performance may be slower than expected when you use multiple disks in Windows Server 2003, in Windows XP, and in Windows 2000 http://support.microsoft.com/default.aspx?scid=kb;EN-US;929491 and Disk Partition Alignment Best Practices for SQL Server http://msdn.microsoft.com/en-us/library/dd758814.aspx
- Disk allocation unit should be 64KB for SQL Server boxes.
  REASON: See Predeployment I/O Best Practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx
NTFS Allocation Unit Size

When formatting the partition that will be used for SQL Server data files, it is recommended that you use a 64-KB allocation unit size for data, logs, and tempdb. Be aware however, that using allocation unit sizes greater than 4 KB results in the inability to use NTFS compression on the volume. SQL Server, although it is not recommended that you use this, does support read-only data on compressed volumes.

·         Drive Compression: Drives will not be compressed.
REASON: Compression has a big negative performance impact on SQL Server.

·         NTFS file system will be used instead of FAT or Raw partitions.
REASON: NTFS allows features such as database snapshots, online DBCC checks, instant file initialization, mount points, and additional security. It has larger file size limits (16 exabytes) than FAT (4 GBs). Raw partitions limit your recoverability options.

·         Often you will need one or more of these to achieve optimal performance for a database [Decide which of these you will deploy for each tier of storage and whether each can be requested by a DBA at server configuration time.]

1.       HBA queue depth for SQL Server is often best at 64 or 128; testing will determine the optimal value.
REASON: See Predeployment I/O Best Practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx
HBA Queue Depth Settings

When configuring HBAs on the host, ensure that the Queue Depth is set to an optimal value. SQL Server applications are generally I/O-intensive, with many concurrent outstanding I/O requests. As a result, the default values for Queue Depth on HBAs are usually not high enough to support optimal performance. Currently the default value for the major HBA vendors is in the range of 8 to 32.

In our SQL Server testing, we have seen substantial gains in I/O performance when increasing this to 64 or even higher. It is worth noting that in these tests SQL Server was usually the only application using the storage array. It is important to discuss with your storage administrator the appropriate values for this setting, as the setting may affect other applications in sharing the same storage environment. When Queue Depth is set too low, a common symptom is increasing latency and less-than-expected throughput given the bandwidth between host/storage and the number of spindles in a particular configuration.

2.       RAID 10 or its equivalent will be used for the highest performance and best recoverability. Read-only data (no updates from users, replication, batch jobs, or anything else) can see acceptable performance on RAID 5. RAID 5 systems will have slower write performance and less recoverability but might be allowed for lower tiered systems with a signoff that high performance is not guaranteed.
REASON: RAID 10 is the fastest disk for SQL Server data and logs. It also provides the best recoverability options.

o   See Physical Database Storage Design http://www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx
“For excellent performance and high reliability of both read and write data patterns, use RAID10.”

o   “RAID10 (stripe of mirrors): RAID10 is essentially many sets of RAID1 or mirrored drives in a RAID0 configuration. This configuration combines the best attributes of striping and mirroring: high performance and good fault tolerance. For these reasons, we recommend using this RAID level. However, the high performance and reliability level is the trade-off for storage capacity.”

o   RAID 10 is recommended for “Data requiring high performance for both read and write and excellent reliability while trading off storage efficiency and cost.“

3.       Follow hardware vendor recommendations for configuring the storage for a database, often this is very different than configuring for other non-database systems.

4.       Keep the physical disks no more than about 80% full (avoid full stroking, get closer to short stroking). Some SAN configurations may make this difficult to determine from Windows with concepts such as thin provisioning.

5.       Use multiple HBA controllers with a high throughput capacity. The same applies for other components such as switch ports, NICs, Fibre Channel array ports, storage array service processors, etc.

6.       Favor (battery backed) write cache over read cache for an OLTP system. Often 80/20 or 90/10 in favor of writes is beneficial. It is relatively easy for a busy SQL Server to flood the cache.

1.       Log writes have the lowest allowable latency of any SQL activity on an OLTP system.

2.       Write cache can help absorb checkpoint bursts as they write data to the disks.

3.       Maintenance operations can be write intensive and long running.

4.       SQL Server’s internal data organization rarely matches the physical layout of the data on disk so IO subsystem level read ahead through the read cache is rarely effective for a database.

7.       Performance will be more predictable, IO troubleshooting will be easier, and in many cases overall performance can be higher if SQL Server is isolated to an IO path not shared with other systems. If it does share with others (which is very common), it is better to share with other databases than with file servers or other systems that have different needs.

·         Data and log files will not go on the same drive (they can have the same drive letter if on different mount points). SANs often hide the physical layer behind the drive letter/mount point by mixing data on the back end but it is still important to keep them separate in case you later move to totally separate IO paths for each.

o   Where cost/benefit analysis allows, each database’s transaction log file(s) will get a separate drive. For maximum performance this would be a truly separate IO path.
REASON: Since writes to the transaction log are sequential (even if there are multiple log files for a given database, only one is written to at a time), if there are multiple database’s log files, or a mix of data and log files, on one drive the read head is moving between them and decreasing performance.

o   Where cost/benefit analysis allows it, multiple, separate IO paths may be requested for each data file.
REASON: The more drives you can use for data files, the more the IO can be spread out to increase performance. For certain types of activities guaranteeing that the IO does not overlap for two different data files can be advantageous.

·         Before putting each SQL box into production, run SQLIO under various loads, gather data to see if the IO subsystem is likely to be capable of the expected load. If you have numbers for a typical and current peak load, we can use those numbers and then add some to it (perhaps test things like 150% of expected peak batch cycle).

·         Before putting each SQL box into production, run SQLIOSim to test for data correctness.

·         Disk IO Subsystem Performance Requirements:

o   Log writes will not exceed [put your values here] ms per write under the expected peak load (typical is 1-5ms for log writes)

o   Non-log writes will not exceed [put your values here] ms per read under the expected peak load (typical is 1-20ms for OLTP and 25-30ms for warehouses)

o   Reads will not exceed [put your values here] ms per read under the expected peak load (typical is 5-20ms for OLTP and 25-30ms for warehouses)

o   These are general standards; some systems may require faster IO subsystems and some may be ok with slower IO subsystems.

o   Measures of read and write speed are from the Performance Monitor counters reads/sec and writes/sec and/or from SQLIO as these are the speeds seen by SQL Server.
REASON: Databases require fast disk access in order to deliver high performance. Many DBA and application developer hours can be spent troubleshooting performance problems that are later tracked to slow IO.

Network Specifications

·         SNP/TCP Chimney settings will depend on whether your NIC vendor supports it. If you have NICs/drivers that support it, turn it on. Otherwise disable it to avoid known problems with SQL Server and other products.
REASON: Performance and usability. When TCP Chimney is enabled it on a NIC that doesn’t support it, you will often see failed connectivity to SQL Server and/or dropped packets and connections that affect SQL server. See Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008 http://support.microsoft.com/kb/951037 and 942861 Error message when an application connects to SQL Server on a server that is running Windows Server 2003: “General Network error,” “Communication link failure,” or “A transport-level error” http://support.microsoft.com/default.aspx?scid=kb;EN-US;942861

·         Put a firewall between your server and the internet. Block TCP port 1433, UDP port 1434, and static ports used by named instances on the perimeter firewalls but not necessarily on individual internal servers. Be careful which ports you block in the other firewalls, SQL Server will use various ports to communicate on.
REASON: Security – Hackers may try to use well known ports to attack SQL Server.

·         Open ports used by SQL Server in the individual server firewalls. [If you define a specific port range for all instances include it here.]
REASON: The DBAs have defined this range of ports as what each SQL Server uses for incoming connections.

Windows Cluster

·         Identical hardware: While Windows 2008 clusters are not required to be identical, to have a greater chance of predictability no matter which node owns each SQL Server group it is recommended that they be configured as close to the same as possible.

·         Windows policies and rights: Windows policies and rights should be the same on all nodes.
REASON: The behavior of SQL Server must be the same on all nodes. Policies can change SQL Server behavior.

·         Mount points: Do not install SQL Server 2000 on any Windows cluster with mount points. The mount points must have an associated drive letter and must be cluster resources in the group where SQL Server will reside. SQL Server must “depend on” all mount points that it uses.
REASON: SQL Server 2005+ supports mount points but SQL Server 2000 setup, including service packs and hotfixes, will fail when it tries to enumerate the mount points, even if they are not in the SQL Server 2000 group. For 2005+ instances, the mount points must be in the SQL Server group in order for SQL Server to access them.

·         The Cluster service account must be a login in the SQL Server instance and a simple user in the master database, but should NOT be a sysadmin role member.
REASON: Avoid elevated privileges.

·         MS DTC Choose a consistent standard for how you configure DTC. On Windows 2008+ clusters you can have more than one DTC per cluster and there are pros/cons to various configuraitons on how SQL Server uses one or more of those DTCs. http://blogs.msdn.com/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx DTC must be enabled for network access. If you choose to put DTC in a group with SQL Server and choose to have a DTC failure cause the group to fail, you should be aware that DTC can cause a SQL Server failover. This may be appropriate in some environments.

·         Cluster Group Names will not contain any special characters such as <, >, ‘, “, &
REASON: Special characters in any group name may cause SQL Server setup to fail.

·         NIC names will not have any special characters or trailing spaces.
REASON: Special characters in any network name may cause SQL Server setup to fail.

·         Auto start must NOT be on for any clustered resource in a Windows cluster.
REASON: The cluster administrator needs to bring the resources online (start the services). If Windows startup has already started the service the cluster service cannot bring it online which results in errors.

·         Use SIDs in a Windows cluster. If you choose not to use Service SIDs, create unique domain groups: Each individual service installed with SQL Server needs a unique domain group created for it unless you choose the default of SIDs. The following naming standard will be used: [put your company standard here: Examples: myserver1_instance1_SQLServer, myserver1_instance1_SQLAgent ]

http://technet.microsoft.com/en-us/library/ms143504.aspx

Best practices that you can use to set up domain groups and solutions to problems that may occur when you set up a domain group when you install a SQL Server 2005 failover cluster http://support.microsoft.com/default.aspx?scid=kb;EN-US;915846
REASON: On a cluster domain groups are used to manage permissions for the SQL Server components. Each service needs a unique group to reduce the attack surface.

Remote Control

·         Remote control to the server will only be done when absolutely necessary. No SQL Server tools will be opened on the production server itself unless there is no other way to access the server.
REASON: It adds overhead to the server and can cause performance problems. Most access will be done from client tools installed on desktops. The overhead of the GUI interfaces is not acceptable on a production server. Some SQL Server client tools are not available on 64-bit systems and all servers will be x64 going forward.

Hardware Specifications

·         Expected Life: Servers are spec’d with an expectation of being in service for up to [put your policy here, 3-4 is common] years unless otherwise stated.
REASON: We need to know in advance how long the hardware is expected to stay in service so we can predict the resources needed for that time period. An uncertainty factor will be considered as well, so systems with high uncertainty in the predictions may need more expansion opportunity (have the ability to add more hardware resources as needed).

·         Architecture: All new servers will be based on x64 hardware with an x64 version/edition of Windows.
REASON: 64-bit systems allow much more memory to be used than 32-bit systems.

Cindy Gross, Microsoft Dedicated Support Engineer for SQL Server and Microsoft Certified Master : SQL Server 2008
March 10, 2011
Power Saving Options on SQL Server

Power Saving Options on SQL Server

Windows 2008 and Windows 2008 R2 have settings for “Power Options”. Windows 2008 R2 has additional power improvements related to “core parking” (temporarily suspending certain cores) and “tick skipping” (extended idle and sleep states). The default power setting for Windows 2008 and 2008 R2 is “Balanced Power” which means that when the system isn’t “busy” various components, such as the CPU and storage, are throttled back. This only happens when the box has CPUs and a BIOS that supports the ACPI specification for power that allows these options to take effect. Newer model CPUs are more likely to support this specification. Database systems that don’t have high sustained CPU usage are probably going to perform better with “High Performance”. Most people size their SQL Server boxes so that the CPUs run at a fairly lower percentage of total available so it’s pretty easy to drop down to the slower CPU speeds and get slow, inconsistent performance. There are many documented cases of “Balanced Power” hampering SQL Server performance. For a SQL Server box if you want consistent, predictable, high performance you either need to fine tune the power setting parameters for each individual workload and/or for different times of day or just set the power options to high performance. If power savings are more important than fast, predictable performance then you can leave it at “Balanced Power”. As a side note that isn’t directly related to SQL Server but may affect virtualized SQL Servers and/or application servers that connect to SQL Server, VMWare has a doc on the need for “High Performance” power settings on the VM host.

SQL

·         SQL Server on Power-Saving CPUs? Not So Fast. http://www.brentozar.com/archive/2010/10/sql-server-on-powersaving-cpus-not-so-fast

·         Are your CPUs running slowly? Tool tip and survey http://www.sqlskills.com/BLOGS/PAUL/post/Are-your-CPUs-running-slowly-Tool-tip-and-survey.aspx

·         Windows OS Power Saving Feature Directly Affects CPU Performance http://www.bradmcgehee.com/2010/07/windows-os-power-saving-feature-directly-affects-cpu-performance/

Virtualization

·         Poor virtual machine application performance may be caused by processor power management settings http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1018206

General

·         Degraded overall performance on Windows Server 2008 R2 http://support.microsoft.com/kb/2207548
“applications or scripts that intensively use resources (primarily processor and memory) may exhibit the problem”

·         Improve Energy Efficiency and Manage Power Consumption with Windows Server 2008 R2 http://download.microsoft.com/download/5/B/D/5BD5C253-4259-428B-A3E4-1F9C3D803074/WS08%20R2%20Power%20Savings%20White%20PaperTDM.docx
“The “High Performance” mode may be appropriate for servers that run at very high utilization and need to provide maximum performance, regardless of power cost.”

·         Performance Tuning Guidelines for Windows Server 2008 http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/Perf-tun-srv.docx

“However, Balanced might not be appropriate for all customers. For example, some applications require very low response times or very high throughput at high load. Other applications might have sensitive timing or synchronization requirements that cannot tolerate changes in processor clock frequency. In such cases, changing the power plan to High Performance might help you to achieve your business goals. Note that the power consumption and operating cost of your server might increase significantly if you select the High Performance plan.”

To view/set

·         Powercfg.cpl

·         Powercfg.exe – Configure Power Plans http://technet.microsoft.com/en-us/library/dd744398(WS.10).aspx

·         Ensuring Maximum CPU Performance via Group Policy http://greg.blogs.sqlsentry.net/2011/01/ensuring-maximum-cpu-performance-via.html
PBM Policy / WMI query to see which CPUs are in power-saving mode http://www.sqlservercentral.com/scripts/CPU/72156/

·         From Powershell: gwmi -class win32_Processor | SELECT CurrentClockSpeed, MaxClockSpeed

March 9, 2011
My database is X GB, how big should my log be?

My database is 10GB, how big should my log be? …. I don’t know. J

That’s not enough information to decide. You might find the best answer for your particular system is 10MB or 10GB or 100GB. It really depends on the amount of activity in the database, whether you tend to have transactions that prevent truncation of the log, and how frequently you do backups. The actual data size is pretty much irrelevant.

Say you have a database with a single table with 1000 rows of data. Each row is updated several times per second and you take transaction log backups once a day. In that case the log will be much larger than the data. On the other hand, picture a 100GB database that is mostly read only and you take transaction log backups once every 15 minutes. The log for that system might be quite small, even only a few MB.

Expecting to be able to determine the log size from the data size is unrealistic unless your databases all follow the same basic pattern. Instead use a test system to look at how much log space is used at average and peak times for various transaction log backup frequencies and then pre-size your log file with some room to spare. Do NOT rely on autogrow outside of a testing period where you are trying to size the log as you can end up with “too many” VLFs, short and long term performance hits, and disk level fragmentation.

Factors That Can Delay Log Truncation

http://msdn.microsoft.com/en-us/library/ms345414.aspx

Transaction Log VLFs – too many or too few?
http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx

Note: Truncating/clearing the log is NOT the same as shrinking the log. Truncating/clearing is normal activity that marks existing space as available for reuse, shrinking changes the size of the actual file on disk and is something you only do in very rare, specific, planned, controlled, non-automated circumstances.

March 1, 2011
Can I Have a Single Node SQL Server Cluster

I occasionally hear the question of whether you can create a one node SQL Server cluster. That basically means you have a virtualized instance of SQL Server with a virtual IP and virtual name that are differerent from the host box, but that there is only one “possible owner” of that SQL Server. Yes, this is allowed and supported. Often it is used for testing or when you have to get the system up and running now but only have one physical server available at the moment. You can add additional nodes to the Windows cluster later, then optionally add those nodes as potential owners of the SQL Server instance. See the “add node” topics in BOL, the exact steps will vary by version. Basically “add node” means that setup adds the SQL binaries and some registry keys to the new node and updates a few cluster related registry keys so that the cluster “service” becomes aware of the additional node as a “possible owner” of that instance of SQL Server. You still only have one copy of the SQL Server databases no matter how many nodes are possible owners.

Choose “Other Versions” if you want to view the BOL topic for something other than the most recent version of SQL Server:
How to: Add or Remove Nodes in a SQL Server Failover Cluster (Setup)
http://msdn.microsoft.com/en-us/library/ms191545.aspx

March 1, 2011
The ins and outs of MAXDOP

Configuring MAXDOP

One of the few sp_configure options that good DBAs regularly change is “max degree of parallelism” or MAXDOP. So of course everyone is looking for the one best value to use for that setting. As with most things in the database world, the answer is “it depends.” I’m going to start with some background to help you understand some of the variables that affect how you should set that value on any given instance of SQL Server.

What is parallelism?

For the purposes of this discussion we can think of SQL Server schedulers as 1 visible, non-DAC scheduler per logical processor as seen by the operating system and made available to SQL Server. The terms processor, core, CPU, thread, and scheduler are often interchanged (sometimes correctly, sometimes not) throughout various documents and not all have just one meaning. I’ll try to stick to the SQL scheduler terminology to avoid confusion with logical/physical processors. For this discussion today I will assume you have not turned on CPU affinity options, you are not using WSRM, and your login is not limited by MAX_DOP settings in a Resource Governor resource pool.

As an example, say you have 4 quad cores with hyperthreading on. That comes to 4x4x2=32 visible schedulers available for non-DAC SQL Server queries.

You can see the number of visible schedulers with this query:

SELECT * FROM sys.dm_os_schedulers

How does a query use parallelism?

When a query is compiled it is broken down into steps/operations internally. When the query runs, each step of the query plan that is eligible to go parallel can be optimized as using 1 or more schedulers. So if the plan has 10 steps, maybe step 1 gets 1 scheduler, step 2 gets 12 schedulers, step 3 gets 4 schedulers, step 4 gets 20 schedulers, etc. The total for the query can add up to more than the MAXDOP because the MAXDOP applies per step. The number of schedulers used per step is not saved in the query plan; it is chosen each time based on availability. For any step that uses more than one processor SQL will wait on CXPackets as it synchronizes the activity across schedulers (exchanges data across threads) for that step. As with other waits in SQL Server, a wait on CXPacket in and of itself is not a problem but rather a normal and expected part of how the system does its work. If the waits become excessive, then you can start looking at parallelism as a potential problem.

You can set the “max degree of parallelism” for the entire instance. This means that any given step in any given query will not be assigned more schedulers than the value set in that configuration option. The setting can be overridden (higher or lower) for individual TSQL queries with the OPTION MAXDOP query hint. Some DDL operations such as ALTER INDEX have MAXDOP hints available as well.

So say a query executes and one or more of its steps “go parallel”. This means that the query optimizer determines the query will finish faster overall when it uses parallel threads for one or more steps, even though it uses more CPU to do so. Partly it’s that the CPU time is more concentrated as more threads are active at once and partly that there is extra overhead in managing the context switches (CXPacket). So when you see CXPackets in use and even when you see waits on CXPackets it can be normal and expected.

Is parallelism good?

Parallelism is very useful when you have a small number of very long running queries. Each query executes faster in parallel and since there are few queries at any given point in time there is likely little blocking or competition for resources. However, if you have many small, fast queries few will “go parallel” because the overhead of managing the data exchanges between threads is greater than the gain from executing in parallel. When you start mixing the types of queries it is harder for the optimizer to choose the right parallelism levels for the overall workload. One slow query may start at a time when things are relatively idle and it may use many threads. Then several small, fast queries come in while the first is still running and the CXPacket waits (number and duration) start to increase making all of the queries go slower. SQL’s CPU usage may go up significantly.

Another problem can arise when the statistics are out of date or very skewed. That means that the optimizer may choose a plan based on the expected number of rows (cardinality estimate) that doesn’t quite balance out as expected because the actual number of rows is so far off. In that case updating statistics (perhaps WITH FULLSCAN if the data is skewed) may help. In many cases adding or altering indexes can help as well since quite often fewer rows are touched when sufficient indexes exist and therefore the need for parallel data access decreases.

On a system where you expect all of the queries to be short/fast, setting the “max degree of parallelism” to 1 (never go parallel) basically gives priority to those short/fast queries and any big/slow queries just take longer to execute but don’t hold up other things as much. Leaving it at 0 (use any #) helps on systems where there are only a few queries running but all are slow/expensive. This is because SQL Server has the most flexibility with a setting of 0 and can choose to use as many schedulers as needed. Anything in between 1 and 0 is an attempt to manually balance those things if you feel SQL isn’t doing it the way you want it to. The only way to really know for sure what the best setting is on your system is to test average and peak workloads with different “max degree of parallelism” settings. You should keep in mind that as the data, number of users, other activity, etc. changes the “best” value may change as well.

What value do I use for “max degree of parallelism”

Now we make it to the question of what value to use for “max degree of parallelism” on your particular instance. Keep in mind that these are general guidelines for OLTP systems. Warehouse/OLAP loads, edge cases, huge systems, many SQL instances, odd configurations, unusual workloads and other factors can outweigh the reasoning behind some of these recommendations. If you’re using hyper-threading, then that is really already doing context switches of a sort and you basically have two SQL schedulers mapped to one physical core. That can cause problems in the balancing equation, so if you have hyper-threading enabled you need to make sure “max degree of parallelism” is not 0 and is not greater than half the number of visible schedulers. If you use CPU affinity mask or WSRM to limit the number of cores available to SQL Server, then your “max degree of parallelism” should be no more than the number of cores available to that instance of SQL Server (if necessary divided by half for hyper-threading). If you are using hard NUMA, keep the value no larger than the number of cores in a NUMA node as this helps reduce costly foreign memory access. As a starting point for an average workload you probably don’t want to use a value greater than 8, though this is not a hard limit but rather a starting point/guideline. Start with the smallest value out of all the rules above. You may need to adjust the value based on your testing and/or as the workload and other factors change over time.

Do I have a parallelism problem?

Next there is the question of how to know if the current “max degree of parallelism” is contributing substantially to a current performance or resource constraint problem. If your Process.SQLServr.%Processor Time/# of logical processors value is much higher than normal/expected AND wait stats shows more than 5% of your waits are on CXPackets, you may want to test lower (or non-zero) values of “max degree of parallelism”. You find connections that have currently running parallel queries by looking for SPIDs with multiple ECIDs in sys.sysprocesses. Query plans that are eligible for parallelism will have the Parallelism operator. If you decide your current degree of parallelism might be negatively impacting performance, use the information above to estimate what may be a good starting point for your particular instance and test/baseline/tune from there. No restart is required after you change the value; new queries automatically use the new value (unless overridden by a hint in the query).

References

·         Case Study: Part 1: CXPACKET Wait Stats & ‘max degree of parallelism’ Option: Introduction to Using Wait Stats to Identify & Remediate Query Parallelism Bottlenecks http://blogs.msdn.com/b/jimmymay/archive/2008/11/28/case-study-part-1-cxpacket-wait-stats-max-degree-of-parallelism-option-introduction-to-using-wait-stats-to-identify-remediate-query-parallelism-bottlenecks.aspx // http://blogs.msdn.com/b/jimmymay/archive/2008/12/02/case-study-part-2-cxpacket-wait-stats-max-degree-of-parallelism-option-suppressing-query-parallelism-eliminated-cxpacket-waits-liberated-30-of-cpu.aspx

·         SQL Server 2005 Waits and Queues http://technet.microsoft.com/en-us/library/cc966413.aspx

·         Recommendations and Guidelines for ‘max degree of parallelism’ configuration option http://support.microsoft.com/kb/2023536

·         Problem fixed in SQL Server 2008 SP1 CU3 and higher: 970399     FIX: The MAXDOP option for a running query or the max degree of parallelism option for the sp_configure stored procedure does not work in SQL Server 2008 http://support.microsoft.com/default.aspx?scid=kb;EN-US;970399

·         SQL University: Parallelism Week – Introduction http://sqlblog.com/blogs/adam_machanic/archive/2010/05/24/sql-university-parallelism-week-introduction.aspx

·         Maximum Number of Processors Supported by the Editions of SQL Server http://msdn.microsoft.com/en-us/library/ms143760(v=SQL.105).aspx

·         Windows 2008 R2 – Groups, Processors, Sockets, Cores Threads, NUMA nodes what is all this? http://blogs.msdn.com/b/saponsqlserver/archive/2010/09/28/windows-2008-r2-groups-processors-sockets-cores-threads-numa-nodes-what-is-all-this.aspx

·         Why Do I see more threads per SPID in sysprocesses than MAXDOP? http://blogs.msdn.com/b/sqlserverfaq/archive/2009/01/05/why-do-i-see-more-threads-per-spid-in-sysprocesses-than-maxdop.aspx

Note: This information was verified for SQL Server 2008 R2 and with very minor variations will apply also to SQL Server 2008 and SQL Server 2005.

MAXDOP.pdf

January 28, 2011
Witty WIT Speakers needed for 24 Hours of PASS – March 2011

SQL PASS is planning another 24 Hours of PASS (24HOP). It will be March 15-16, 2011. Since that is during Women’s History month all the speakers will be women. So if you’re a Woman in Tech (WIT) and know something about SQL or how Windows or storage affects SQL this is a great opportunity!

Email your abstract to 24hours@sqlpass.org by Jan 14, 2011. No more than 250 words for the abstract itself plus 125 words for your bio.

So what is 24HOP? The format has changed a bit over time, but currently it consists of 12 hours of back to back one hour sessions about SQL Server held two days in a row. Each session is delivered by a different speaker and the topics vary quite a bit but all relate to understanding SQL Server. This is all done over Live Meeting so you can attend from your desk, and the sessions are made available for viewing later if you can’t attend them all live. Ideally the sessions will have time at the end for questions, so you don’t need a full hour of slides and/or demos.

So why should you submit an abstract? Here are a few good reasons: Community involvement, including speaking, is one of the things that can help qualify you for nomination for SQL MVP. A speaking engagement is a great item to put on your review or resume. And by speaking at smaller events like 24HOP it becomes easier to get your abstracts selected at bigger, more competitive events such as SQL PASS.

I look forward to seeing all of you speaking at 24HOP, your local user groups, and SQL PASS!

More SQL Women and WIT discussions:

·         Why So Few? http://www.aauw.org/learn/research/whysofew.cfm

·         Data Chicks, We Need You! Call for Speakers http://blog.infoadvisors.com/index.php/2010/12/09/data-chicks-we-need-you-call-for-speakers-24hop/

·         Trolling the #24HoP http://blog.infoadvisors.com/index.php/2010/12/10/trolling-the-24hop

·         24HOP Needs Women http://sqlserverpedia.com/blog/sql-server-bloggers/24hop-needs-women/

·         I Need More Women http://thomaslarock.com/2010/12/i-need-more-women

·         Women in Technology – Why Does it Matter? http://blogs.msdn.com/b/cindygross/archive/2010/12/09/women-in-technology-why-does-it-matter.aspx

·         The Next 24 Hours of PASS Event: Announcement and Call for Presenters http://littlekendra.com/2010/12/02/the-next-24-hours-of-pass-event-announcement-and-call-for-presenters/

·         The Next 24 Hours of PASS http://www.jasonstrate.com/2010/12/the-next-24-hours-of-pass/

December 13, 2010
Women in Technology – Why Does it Matter?

Women in Technology – Why Does it Matter?

The yearly SQL PASS summit is always one of my favorite times of year. It’s a week packed full of great technical content and many networking opportunities. Not to mention the parties and fun. J And of course the #sqlkilts. One of the highlights of the #sqlpass conference is the Women in Technology (WIT) Luncheon. This year quite a few men (and a couple of women) wore kilts that day and several of those wore shirts designed by @MidnightDBA saying “I’m supporting Women in Tech. What are YOU doing? (Plus, I look hot in a skirt)”. They came with the rest of us to hear a panel that discussed topics such as how women influence innovation, how women affect the bottom line, and why high tech jobs benefit women. Many audience members participated with comments and questions. I was so inspired by the day that I am taking some of the key points and summarizing them.

This year we had a great panel of speakers:

Billie Jo Murray, General Manager, SQL Central Services, Microsoft

Nora Denzel, Senior Vice President and General Manager – Employee Management Solutions, Intuit

Michelle Ufford, Senior SQL Server DBA, GoDaddy.com

Denise McInerney, Staff Database Administrator, Intuit

Stacia Misner, Principal, Data Inspirations

The number of men in the audience at the WIT luncheon continues to grow each year, as more people realize that supporting women in tech takes support from both men and women and that everyone benefits from the diversity. As Nora said: “Welcome to the women, and also welcome to the men, and also welcome to the men in skirts”. The number of women in technical jobs is decreasing at a faster rate than in other occupations, and the percentage of computer science graduates who are women is plummeting. Going forward it will be harder and harder for companies to recruit women into technical jobs. There doesn’t seem to be a good explanation for why this is happening. There are plenty of hypotheses but so far there doesn’t seem be consensus on the causes and more importantly on what to do about it.

So why do we care? Why does it matter how many women there are in tech? Why do we need a special group, time, or event just for WIT? There were many good points given about this during the lunch, both from the panel and from the audience. As Nora and Michelle both pointed out, at a high level diversity helps teams deliver a better product and fosters innovation. When you have people from different backgrounds, they approach the problem/product/issue in varying ways. The more approaches you have during the development phase and the broader the base for feedback, the more innovative and useful the end product is. This isn’t just some people sitting around a room and complaining; studies have borne this out. Gender is only one aspect of diversity, but it’s an important one. As the panel said, diversity is a means to an end. Diversity done right attracts great talent, leads to higher ROI, and makes the workplace healthier. Diverse companies are more likely to be voted a great place to work and that higher morale can translate to a better bottom line. Denise shared a great quote from Bill Gates. BillG was giving a speech in Saudi Arabia to an audience segregated by gender. There was a question to BillG about whether Saudi Arabia could become a top competitive economy by 2010. His answer: “…if you’re not fully utilizing half the talent in the country, you’re not going to get too close to the top.”

Women (yes, stereotypically and not across the board) tend to approach tech projects differently than men. Often they take the perspective that technology’s purpose is to help others and therefore they think of projects in those terms. Teams shaped by stereotypically male dominated thinking often take an approach of fixing a problem or using something just because it’s “cool”. Both approaches have their place, and when they’re combined the innovation can explode (hopefully in a good way).

So what incentives are there for women to take a tech job and stay in the tech industry? I would argue that it’s a fun career, but what else is enticing? How about money and financial security? Tech jobs tend to pay well in general. The gender wage gap tends to be lower in the tech industry, so for the same job there’s a higher chance a woman will be paid as much as a man with the same skills. As Denise said, with the increased financial security from a tech job, a woman has greater control over all aspects of her life. From another perspective, as women in tech we are “thought workers”. That means we are valued for how smart we are, for our brains. In the tech world we can compete on the basis of our ideas. Despite the constant media message about women having to always compete for who looks the best, see who can dumb themselves down the most, and avoid math and science, in the tech world we can shine based on our merits.

If it’s such a great career path for women, why aren’t there more women here? I already mentioned the rapidly decreasing percentage of women with computer science degrees, though I have heard that in other countries that may not be the case. But why don’t women apply for tech jobs? Is it the geek image? A lack of desire to work around all that testosterone all the time? The media-fed feeling that women just aren’t good at math, science, and the “hard stuff”? The lack of glamour or perception of long hours? A feeling that we won’t fit in? Again, there is no consensus on why. If we could figure that out, maybe it would be easier to solve the problem. As Jimmy May said, “she-geeks are cool”, and we need to communicate that to women.

For women already in the tech arena, how do we approach our jobs? There seemed to be wide agreement that as women we are much more likely to seek perfection of an idea before we present it. We aren’t as likely to speak up at meetings if we don’t feel we are 100% prepared. We need to be more comfortable with being uncomfortable. It’s ok to make mistakes or to say “I’ll have to get back to you”. As Billie Jo said, women need to “sit at the table”. This applies both literally and figuratively. She talked about attending large meetings with so many attendees that there were chairs lining the walls as well as around the table. And who does she see sitting in the chairs “out of the way” around the edges instead of at the table? Women. Take a seat at the table, say something even if it’s not brilliant, and be an active participant. Despite the common perception of many women, we don’t have to know everything and do everything to be good at our jobs. Don’t wait for perfection before acting; step in and become a participant. The example was given that if a job application has ten requirements a women with “only” nine qualifications will not apply, but a man with only one qualification is likely to submit an application. That’s a bit exaggerated but does show a gender gap that many of us can identify with. Get over it! Move on and become a player in the tech game!

Now we’re all convinced that we need to have WIT. And we also know women aren’t getting many computer science degrees. So who do we recruit? How about math or physics majors? Billie Jo expressed an appreciation for music majors because the way their minds work is similar to what is needed in a tech environment. I know very successful techies with degrees in such seemingly unrelated fields as English. So don’t limit your search to the typical candidates. If something isn’t a true, absolute requirement, word the job description so it’s clearly a “nice to have” and you may see more women apply. Try to find the essence of what you’re looking for and include that description instead of some example of how someone else typified that essence. And remember that a broad background in college and life is very helpful. Billie Jo pointed out that her experience is that women tend to have a broader background in their coursework. This makes them more flexible and often makes it easier for them to fit in and advance at work. Stacia made the point that we need to look at people in the business world, especially for business intelligence type work. Don’t think of tech vs. non-tech. If IT people rely on a business person who consistently takes the real world requirements and makes them understandable to a techie, maybe they are a candidate for an IT job. Look beyond the normal and expected and you may be surprised who you find.

Life/Work balance always seems to come up when we talk about WIT. It’s not only women who need this balance, but for some reason we seem to be the ones who visibly seek it. This is a discussion for another time, but one important point an audience member made is to ask your family for help. It’s ok to tell your spouse that you’re going to SQL PASS next year and it’s the partner/spouse’s responsibility to look after the kids that week. It’s ok to ask them to do some extra housework while you prepare the presentation you’re going to give at your local user group (or at SQL PASS!). You support them and they feel good when they get to return that support. So ask for support from your friends and family and don’t feel bad about it!

So what can each of us do to support and encourage WIT? Some of the ideas suggested include:

·         Watch the lunch panel here: PASS Summit 2010 Women in Technology Live Streaming Panel Discussion http://www.sqlpass.org/summit/na2010/LiveKeynotes/WITLuncheon.aspx

·         Don’t wait to be perfect or have perfect knowledge before you act.

·         Mentor women who would make good SQL MVPs or SQL MCMs (BJ offered to help!).

·         Don’t put up with a lack of WIT support at your company.

·         Be a peer mentor (to a man or woman) and seek out a peer mentor (man or woman).

·         Nominate qualified WIT peers for the MVP program.

·         As a WIT: submit an abstract for a conference, offer to speak at a user group or code camp, get involved in public/visible ways.

·         Have 1:1 conversations about tech, WIT, and/or diversity with people you can influence.

·         Thank the WIT in your life, starting with the SQL PASS WIT planning team!

References:

#passwit search https://twitter.com/#!/search/%23passwit

#passwit search http://archivist.visitmix.com/adc612a0/1

Main WIT page at SQLPASS http://wit.sqlpass.org/

https://twitter.com/#!/sqlpass

https://twitter.com/#!/pass_wit

Blogs

http://www.midnightdba.com/Jen/2010/11/sqlpass-day-2-wit-luncheon-live-blog/

http://www.sqlservercentral.com/blogs/kathi_kellenberger/archive/2010/11/17/pass-summit-report-3-women-in-technology-rock.aspx

http://nullgarity.wordpress.com/2010/11/14/the-kinetic-serendipity-of-the-written-word/

http://wit.sqlpass.org/WITBlogs/tabid/3018/Default.aspx

December 9, 2010
New Opportunities to be a SQL Master

Have you ever wanted to prove to yourself and others just how much you really know about SQL Server? The MCITP exams are a great start; they show you have a baseline level of knowledge about the product. However, sometimes it is possible to pass the MCITP level exams with little real world experience. For those of us with many years of real world experience we can take it one step further. The next level is the Microsoft Certified Master aka MCM. Before last week, to get the MCM certification for SQL Server your only choice was to buy the training bundled with the exams. The training session was three weeks of high speed, in depth schooling across the engine delivered by some of the top experts in the industry. It was a great networking and education experience: I learned a lot and made some great friends. However, when part of the requirement for the certification was that you could afford to take three weeks off work and pay a lot of money for the privilege of taking a set of exams many still didn’t pass, it unfairly excluded some very talented people. So as of last Tuesday some great changes were announced to the SQL MCM program. Now the training is no longer bundled with the exams. After the changes to the program that were just announced, you can choose to take as much or as little training as you want. There are now many hours of free training already available online, and you can still choose to pay for instructor led training from industry experts. Once you feel you are prepared, you can sign up for the written exam for $500. If you pass that filter, then you can sign up for the 6 hour lab exam for $2000. If you fail either exam there is a waiting period before you can retake it. The exams are not available at all Prometric locations due to concerns about security and/or bandwidth in some places, but at least it is no longer restricted to just Redmond. The tests are still extremely difficult and the precautions to protect against devaluation of the exams via cheaters are extensive. But now the MCM testing is available to a wider range of experienced, knowledgeable SQL Server experts, and I hope that if you feel you go above and beyond the average senior DBA you will consider preparing for the exams. Good luck!

·         SQL Server Masters Certification Goes Global

·         New Path to Microsoft Certified Master: Microsoft SQL Server 2008

·         Data sheet (executive summary)

·         SQL Server 2008 Microsoft Certified Master (MCM) Readiness Videos

·         Current SQL Server MCMs

November 14, 2010