Befriending Dragons

Transform Tech with Anti-bullying Cultures


2 Comments

General Hardware/OS/Network Guidelines for a SQL Box

I have put together some聽general guidelines for how you want a server to be delivered to the DBA team for a new SQL Server install. You won’t necessarily use all of them, but consider it a starting point for your SQL Server install standards. Places where it may be common to change the statements are in [red]. Always run the SQL Server Best Practices Analyzer or an equivalent tool to check for the latest recommendations before releasing the system to production. I’m sure some of you will disagree with some of the points for various reasons, but I’ve found them to be a good baseline for discussion and a great starting point for standards documents. I’m ok with that because I am very fond of saying “it depends”. 馃檪

The below is specific to SQL Server 2008/200R2 on Windows 2008/2008R2.

OS Specifications (things often controlled by a Windows team)

聽聽聽聽聽聽聽聽 Power saving features: For a SQL Server box if you want consistent, predictable, high performance you either need to fine tune the power setting parameters for each individual workload and/or for different times of day or just set the power options to high performance. Databases are harder to fit into the normal power saving profile so they don鈥檛 fit as well into the default power saving settings. [If your environment requires that you favor power savings over performance change this statement and be aware of the impact.]

聽聽聽聽聽聽聽聽 You should double check that your virus scanner is certified for Windows 2008 R2. Older scanners use TDI and you need WFP models to work properly on the newer OSs. The older type of anti-virus scanners can cause serious IO problems.
981889 A Windows Filtering Platform (WFP) driver hotfix rollup package is available for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;en-US;981889
979278 Using two Windows Filtering Platform (WFP) drivers causes a computer to crash when the computer is running Windows Vista, Windows 7, or Windows Server 2008
http://support.microsoft.com/default.aspx?scid=kb;EN-US;979278
979223 A nonpaged pool memory leak occurs when you use a WFP callout driver in Windows Vista, Windows 7, Windows Server 2008, or in Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;EN-US;979223
976759 WFP drivers may cause a failure to disconnect the RDP connection to a multiprocessor computer that is running Windows Vista, Windows Server 2008, windows 7 or Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;EN-US;976759
Windows Filtering Platform
http://www.microsoft.com/whdc/device/network/WFP.mspx

聽聽聽聽聽聽聽聽 Virus scanners and spyware detection should not scan SQL Server data and log files (usually mdf/ldf/ndf extensions) and other SQL related files because the scanning significantly degrades performance. [Note that this is a tradeoff with security and you must decide on performance vs. security based on your own security guidelines.]
REASON: Performance, smoother setup. See 309422 Guidelines for choosing antivirus software to run on the computers that are running SQL Server
http://support.microsoft.com/default.aspx?scid=kb;EN-US;309422

聽聽聽聽聽聽聽聽 Firmware, BIOS, network adapter drivers, storport drivers, etc. will be at their most recent, stable versions before the server is released to the DBAs.
REASON: There are common SQL Server performance, usability, and supportability problems caused by older firmware, BIOS, network adapter drivers, etc.

聽聽聽聽聽聽聽聽 For Windows 2008 and Windows 2008 R2 you can download a Windows storport enhancement (packaged as a hotfix). This enhancement can lead to faster root cause analysis for slow IO issues. Once you apply this Windows hotfix you can use Event Tracing for Windows (ETW) via perfmon or xperf to capture more detailed IO information that you can share with your storage team. 聽http://blogs.msdn.com/b/cindygross/archive/2010/07/13/windows-storport-enhancement-to-help-troubleshoot-io-issues.aspx

聽聽聽聽聽聽聽聽 Do not install SQL Server on a domain controller (DC).
REASON:聽 A busy DC can take resources away from SQL Server. There are also negative security implications from installing SQL Server on a DC.

聽聽聽聽聽聽聽聽 Grant SE_MANAGE_VOLUME_NAME to the SQL Server group to allow instant file initialization of data (but not log) files. There is a small security risk associated with this but it can greatly improve the performance of CREATE/ALTER data (but not log) files. [Decide as a company whether this performance enhancement is worth the small risk] http://blogs.msdn.com/b/sql_pfe_blog/archive/2009/12/23/how-and-why-to-enable-instant-file-initialization.aspx

聽聽聽聽聽聽聽聽 Critical updates for Windows will be tested and applied ASAP after their release.
REASON: Security that affects Windows often affects SQL Server as well.

聽聽聽聽聽聽聽聽 Resource intensive screensavers will be disabled and replaced with low resource consumption security to lock the consoles.
REASON: Performance 鈥 Resource intensive screen savers can steal resources from SQL Server.

聽聽聽聽聽聽聽聽 Files will be secured: All copies of the data and log files as well as all copies of the backup files will be secured with access given only to those documented in the SQL Server Disaster Recovery plan.
REASON: Data and log files can be copied and attached to another instance of SQL Server, thereby exposing the information to the sysadmins of the new instance. Therefore access to these files must be very limited. However enough access must be granted to allow for recovery.

聽聽聽聽聽聽聽聽 EFS: SQL Server will not be installed on disk that is encrypted with EFS.
REASON: 922121 You may experience decreased performance in some features of SQL Server 2005 when you use EFS to encrypt database files
http://support.microsoft.com/default.aspx?scid=kb;EN-US;922121

Storage Specifications (often configured by a Windows and/or storage team)

聽聽聽聽聽聽聽聽 Battery backup must be enabled for all controllers or storage media which do write caching.
REASON: This is required by the WAL protocol to ensure stable media for SQL Server. See
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqliobasics.mspx#EYCAE

聽聽聽聽聽聽聽聽 For SQL Server disks, performance is more important than conserving space. This means there may be what would be considered 鈥渨asted space鈥 on a file server and that the overall cost per stored MB will be higher for a database system than for a file server. [This is a general guideline, if your environment prefers costs savings and space usage maximization over performance change this statement.]
REASON: High performance is generally a major requirement of a database system, and is much more important than on most file systems. Higher performance requires that disk be laid out, configured, and managed in particular ways.

  • Disk alignment must be done to a multiple of 64KB. Some vendors may express a preference for a particular value, but most mainstream hardware vendors have agreed that 1024KB is acceptable. That is the default for Windows 2008+. If you use dynamic disks it is difficult to see the alignment from Windows.
    REASON: If the disk is not aligned, performance can suffer as much as 30-40% because some read/write activity may be to/from two blocks instead of one. See 929491 Disk performance may be slower than expected when you use multiple disks in Windows Server 2003, in Windows XP, and in Windows 2000 http://support.microsoft.com/default.aspx?scid=kb;EN-US;929491 and Disk Partition Alignment Best Practices for SQL Server http://msdn.microsoft.com/en-us/library/dd758814.aspx
  • Disk allocation unit should be 64KB for SQL Server boxes.
    REASON: See
    Predeployment I/O Best Practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx

NTFS Allocation Unit Size

When formatting the partition that will be used for SQL Server data files, it is recommended that you use a 64-KB allocation unit size for data, logs, and tempdb. Be aware however, that using allocation unit sizes greater than 4聽KB results in the inability to use NTFS compression on the volume. SQL Server, although it is not recommended that you use this, does support read-only data on compressed volumes.

聽聽聽聽聽聽聽聽 Drive Compression: Drives will not be compressed.
REASON: Compression has a big negative performance impact on SQL Server.

聽聽聽聽聽聽聽聽 NTFS file system will be used instead of FAT or Raw partitions.
REASON: NTFS allows features such as database snapshots, online DBCC checks, instant file initialization, mount points, and additional security. It has larger file size limits (16 exabytes) than FAT (4 GBs). Raw partitions limit your recoverability options.

聽聽聽聽聽聽聽聽 Often you will need one or more of these to achieve optimal performance for a database [Decide which of these you will deploy for each tier of storage and whether each can be requested by a DBA at server configuration time.]

1.聽聽聽聽聽聽 HBA queue depth for SQL Server is often best at 64 or 128; testing will determine the optimal value.
REASON: See Predeployment I/O Best Practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx
HBA Queue Depth Settings

When configuring HBAs on the host, ensure that the Queue Depth is set to an optimal value. SQL Server applications are generally I/O-intensive, with many concurrent outstanding I/O requests. As a result, the default values for Queue Depth on HBAs are usually not high enough to support optimal performance. Currently the default value for the major HBA vendors is in the range of 8 to 32.

In our SQL Server testing, we have seen substantial gains in I/O performance when increasing this to 64 or even higher. It is worth noting that in these tests SQL Server was usually the only application using the storage array. It is important to discuss with your storage administrator the appropriate values for this setting, as the setting may affect other applications in sharing the same storage environment. When Queue Depth is set too low, a common symptom is increasing latency and less-than-expected throughput given the bandwidth between host/storage and the number of spindles in a particular configuration.

2.聽聽聽聽聽聽 RAID 10 or its equivalent will be used for the highest performance and best recoverability. Read-only data (no updates from users, replication, batch jobs, or anything else) can see acceptable performance on RAID 5. RAID 5 systems will have slower write performance and less recoverability but might be allowed for lower tiered systems with a signoff that high performance is not guaranteed.
REASON: RAID 10 is the fastest disk for SQL Server data and logs. It also provides the best recoverability options.

o聽聽 See Physical Database Storage Design http://www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx
鈥淔or excellent performance and high reliability of both read and write data patterns, use RAID10.鈥

o聽聽 鈥淩AID10 (stripe of mirrors): RAID10 is essentially many sets of RAID1 or mirrored drives in a RAID0 configuration. This configuration combines the best attributes of striping and mirroring: high performance and good fault tolerance. For these reasons, we recommend using this RAID level. However, the high performance and reliability level is the trade-off for storage capacity.鈥

o聽聽 RAID 10 is recommended for 鈥淒ata requiring high performance for both read and write and excellent reliability while trading off storage efficiency and cost.鈥

3.聽聽聽聽聽聽 Follow hardware vendor recommendations for configuring the storage for a database, often this is very different than configuring for other non-database systems.

4.聽聽聽聽聽聽 Keep the physical disks no more than about 80% full (avoid full stroking, get closer to short stroking). Some SAN configurations may make this difficult to determine from Windows with concepts such as thin provisioning.

5.聽聽聽聽聽聽 Use multiple HBA controllers with a high throughput capacity. The same applies for other components such as switch ports, NICs, Fibre Channel array ports, storage array service processors, etc.

6.聽聽聽聽聽聽 Favor (battery backed) write cache over read cache for an OLTP system. Often 80/20 or 90/10 in favor of writes is beneficial. It is relatively easy for a busy SQL Server to flood the cache.

1.聽聽聽聽聽聽 Log writes have the lowest allowable latency of any SQL activity on an OLTP system.

2.聽聽聽聽聽聽 Write cache can help absorb checkpoint bursts as they write data to the disks.

3.聽聽聽聽聽聽 Maintenance operations can be write intensive and long running.

4.聽聽聽聽聽聽 SQL Server鈥檚 internal data organization rarely matches the physical layout of the data on disk so IO subsystem level read ahead through the read cache is rarely effective for a database.

7.聽聽聽聽聽聽 Performance will be more predictable, IO troubleshooting will be easier, and in many cases overall performance can be higher if SQL Server is isolated to an IO path not shared with other systems. If it does share with others (which is very common), it is better to share with other databases than with file servers or other systems that have different needs.

聽聽聽聽聽聽聽聽 Data and log files will not go on the same drive (they can have the same drive letter if on different mount points). SANs often hide the physical layer behind the drive letter/mount point by mixing data on the back end but it is still important to keep them separate in case you later move to totally separate IO paths for each.

o聽聽 Where cost/benefit analysis allows, each database鈥檚 transaction log file(s) will get a separate drive. 聽For maximum performance this would be a truly separate IO path.
REASON: Since writes to the transaction log are sequential (even if there are multiple log files for a given database, only one is written to at a time), if there are multiple database鈥檚 log files, or a mix of data and log files, on one drive the read head is moving between them and decreasing performance.

o聽聽 Where cost/benefit analysis allows it, multiple, separate IO paths may be requested for each data file.
REASON: The more drives you can use for data files, the more the IO can be spread out to increase performance. For certain types of activities guaranteeing that the IO does not overlap for two different data files can be advantageous.

聽聽聽聽聽聽聽聽 Before putting each SQL box into production, run SQLIO under various loads, gather data to see if the IO subsystem is likely to be capable of the expected load. If you have numbers for a typical and current peak load, we can use those numbers and then add some to it (perhaps test things like 150% of expected peak batch cycle).

聽聽聽聽聽聽聽聽 Before putting each SQL box into production, run SQLIOSim to test for data correctness.

聽聽聽聽聽聽聽聽 Disk IO Subsystem Performance Requirements:

o聽聽 Log writes will not exceed [put your values here] ms per write under the expected peak load (typical is 1-5ms for log writes)

o聽聽 Non-log writes will not exceed [put your values here] ms per read under the expected peak load (typical is 1-20ms for OLTP and 25-30ms for warehouses)

o聽聽 Reads will not exceed [put your values here] ms per read under the expected peak load (typical is 5-20ms for OLTP and 25-30ms for warehouses)

o聽聽 These are general standards; some systems may require faster IO subsystems and some may be ok with slower IO subsystems.

o聽聽 Measures of read and write speed are from the Performance Monitor counters reads/sec and writes/sec and/or from SQLIO as these are the speeds seen by SQL Server.
REASON: Databases require fast disk access in order to deliver high performance. Many DBA and application developer hours can be spent troubleshooting performance problems that are later tracked to slow IO.

Network Specifications

聽聽聽聽聽聽聽聽 SNP/TCP Chimney settings will depend on whether your NIC vendor supports it. If you have NICs/drivers that support it, turn it on. Otherwise disable it to avoid known problems with SQL Server and other products.
REASON: Performance and usability. When TCP Chimney is enabled it on a NIC that doesn鈥檛 support it, you will often see failed connectivity to SQL Server and/or
dropped packets and connections that affect SQL server. See Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008 http://support.microsoft.com/kb/951037 聽and 942861 Error message when an application connects to SQL Server on a server that is running Windows Server 2003: “General Network error,” “Communication link failure,” or “A transport-level error” http://support.microsoft.com/default.aspx?scid=kb;EN-US;942861

聽聽聽聽聽聽聽聽 Put a firewall between your server and the internet. Block TCP port 1433, UDP port 1434, and static ports used by named instances on the perimeter firewalls but not necessarily on individual internal servers. Be careful which ports you block in the other firewalls, SQL Server will use various ports to communicate on.
REASON: Security 鈥 Hackers may try to use well known ports to attack SQL Server.

聽聽聽聽聽聽聽聽 Open ports used by SQL Server in the individual server firewalls. [If you define a specific port range for all instances include it here.]
REASON: The DBAs have defined this range of ports as what each SQL Server uses for incoming connections.

Windows Cluster

聽聽聽聽聽聽聽聽 Identical hardware: While Windows 2008 clusters are not required to be identical, to have a greater chance of predictability no matter which node owns each SQL Server group it is recommended that they be configured as close to the same as possible.

聽聽聽聽聽聽聽聽 Windows policies and rights: Windows policies and rights should be the same on all nodes.
REASON: The behavior of SQL Server must be the same on all nodes. Policies can change SQL Server behavior.

聽聽聽聽聽聽聽聽 Mount points: Do not install SQL Server 2000 on any Windows cluster with mount points.聽 The mount points must have an associated drive letter and must be cluster resources in the group where SQL Server will reside. SQL Server must 鈥渄epend on鈥 all mount points that it uses.
REASON: SQL Server 2005+ supports mount points but SQL Server 2000 setup, including service packs and hotfixes, will fail when it tries to enumerate the mount points, even if they are not in the SQL Server 2000 group. For 2005+ instances, the mount points must be in the SQL Server group in order for SQL Server to access them.

聽聽聽聽聽聽聽聽 The Cluster service account must be a login in the SQL Server instance and a simple user in the master database, but should NOT be a sysadmin role member.
REASON: Avoid elevated privileges.

聽聽聽聽聽聽聽聽 MS DTC Choose a consistent standard for how you configure DTC. On Windows 2008+ clusters you can have more than one DTC per cluster and there are pros/cons to various configuraitons on how SQL Server uses one or more of those DTCs. http://blogs.msdn.com/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx DTC must be enabled for network access. If you choose to put DTC in a group with SQL Server and choose to have a DTC failure cause the group to fail, you should be aware that DTC can cause a SQL Server failover. This may be appropriate in some environments.

聽聽聽聽聽聽聽聽 Cluster Group Names will not contain any special characters such as <, >, 鈥, 鈥, &
REASON: Special characters in any group name may cause SQL Server setup to fail.

聽聽聽聽聽聽聽聽 NIC names will not have any special characters or trailing spaces.
REASON: Special characters in any network name may cause SQL Server setup to fail.

聽聽聽聽聽聽聽聽 Auto start must NOT be on for any clustered resource in a Windows cluster.
REASON: The cluster administrator needs to bring the resources online (start the services). If Windows startup has already started the service the cluster service cannot bring it online which results in errors.

聽聽聽聽聽聽聽聽 Use SIDs in a Windows cluster. If you choose not to use Service SIDs, create unique domain groups: Each individual service installed with SQL Server needs a unique domain group created for it unless you choose the default of SIDs. The following naming standard will be used: [put your company standard here:聽 Examples: myserver1_instance1_SQLServer, myserver1_instance1_SQLAgent聽 ] 聽聽聽聽

http://technet.microsoft.com/en-us/library/ms143504.aspx

Best practices that you can use to set up domain groups and solutions to problems that may occur when you set up a domain group when you install a SQL Server 2005 failover cluster http://support.microsoft.com/default.aspx?scid=kb;EN-US;915846
REASON: On a cluster domain groups are used to manage permissions for the SQL Server components. Each service needs a unique group to reduce the attack surface.

Remote Control

聽聽聽聽聽聽聽聽 Remote control to the server will only be done when absolutely necessary. No SQL Server tools will be opened on the production server itself unless there is no other way to access the server.
REASON: It adds overhead to the server and can cause performance problems. Most access will be done from client tools installed on desktops. The overhead of the GUI interfaces is not acceptable on a production server. Some SQL Server client tools are not available on 64-bit systems and all servers will be x64 going forward.

Hardware Specifications

聽聽聽聽聽聽聽聽 Expected Life: Servers are spec鈥檇 with an expectation of being in service for up to [put your policy here, 3-4 is common] years unless otherwise stated.
REASON: We need to know in advance how long the hardware is expected to stay in service so we can predict the resources needed for that time period. An uncertainty factor will be considered as well, so systems with high uncertainty in the predictions may need more expansion opportunity (have the ability to add more hardware resources as needed).

聽聽聽聽聽聽聽聽 Architecture: All new servers will be based on x64 hardware with an x64 version/edition of Windows.
REASON: 64-bit systems allow much more memory to be used than 32-bit systems.

Cindy Gross, Microsoft Dedicated Support Engineer for SQL Server and Microsoft Certified Master : SQL Server 2008


1 Comment

Power Saving Options on SQL Server

Power Saving Options on SQL Server

Windows 2008 and Windows 2008 R2 have settings for 鈥淧ower Options鈥. Windows 2008 R2 has additional power improvements related to 鈥渃ore parking鈥 (temporarily suspending certain cores) and 鈥渢ick skipping鈥 (extended idle and sleep states). The default power setting for Windows 2008 and 2008 R2 is 鈥淏alanced Power鈥 which means that when the system isn鈥檛 鈥渂usy鈥 various components, such as the CPU and storage, are throttled back. This only happens when the box has CPUs and a BIOS that supports the ACPI specification for power that allows these options to take effect. Newer model CPUs are more likely to support this specification. Database systems that don鈥檛 have high sustained CPU usage are probably going to perform better with 鈥淗igh Performance鈥. 聽Most people size their SQL Server boxes so that the CPUs run at a fairly lower percentage of total available so it鈥檚 pretty easy to drop down to the slower CPU speeds and get slow, inconsistent performance. There are many documented cases of 鈥淏alanced Power鈥 hampering SQL Server performance.聽 For a SQL Server box if you want consistent, predictable, high performance you either need to fine tune the power setting parameters for each individual workload and/or for different times of day or just set the power options to high performance. If power savings are more important than fast, predictable performance then you can leave it at 鈥淏alanced Power鈥. As a side note that isn鈥檛 directly related to SQL Server but may affect virtualized SQL Servers and/or application servers that connect to SQL Server, VMWare has a doc on the need for 鈥淗igh Performance鈥 power settings on the VM host.

SQL

聽聽聽聽聽聽聽聽 SQL Server on Power-Saving CPUs? Not So Fast. http://www.brentozar.com/archive/2010/10/sql-server-on-powersaving-cpus-not-so-fast

聽聽聽聽聽聽聽聽 Are your CPUs running slowly? Tool tip and survey http://www.sqlskills.com/BLOGS/PAUL/post/Are-your-CPUs-running-slowly-Tool-tip-and-survey.aspx

聽聽聽聽聽聽聽聽 Windows OS Power Saving Feature Directly Affects CPU Performance http://www.bradmcgehee.com/2010/07/windows-os-power-saving-feature-directly-affects-cpu-performance/

Virtualization

聽聽聽聽聽聽聽聽 Poor virtual machine application performance may be caused by processor power management settings http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1018206

General

聽聽聽聽聽聽聽聽 Degraded overall performance on Windows Server 2008 R2 聽http://support.microsoft.com/kb/2207548
applications or scripts that intensively use resources (primarily processor and memory) may exhibit the problem鈥

聽聽聽聽聽聽聽聽 Improve Energy Efficiency and Manage Power Consumption with Windows Server 2008 R2 http://download.microsoft.com/download/5/B/D/5BD5C253-4259-428B-A3E4-1F9C3D803074/WS08%20R2%20Power%20Savings%20White%20PaperTDM.docx
鈥淭he 鈥淗igh Performance鈥 mode may be appropriate for servers that run at very high utilization and need to provide maximum performance, regardless of power cost.鈥

聽聽聽聽聽聽聽聽 Performance Tuning Guidelines for Windows Server 2008 http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/Perf-tun-srv.docx

鈥淗owever, Balanced might not be appropriate for all customers. For example, some applications require very low response times or very high throughput at high load. Other applications might have sensitive timing or synchronization requirements that cannot tolerate changes in processor clock frequency. In such cases, changing the power plan to High Performance might help you to achieve your business goals. Note that the power consumption and operating cost of your server might increase significantly if you select the High Performance plan.鈥

To view/set

聽聽聽聽聽聽聽聽 Powercfg.cpl

聽聽聽聽聽聽聽聽 Powercfg.exe – Configure Power Plans http://technet.microsoft.com/en-us/library/dd744398(WS.10).aspx

聽聽聽聽聽聽聽聽 Ensuring Maximum CPU Performance via Group Policy http://greg.blogs.sqlsentry.net/2011/01/ensuring-maximum-cpu-performance-via.html
PBM Policy / WMI query to see which CPUs are in power-saving mode
http://www.sqlservercentral.com/scripts/CPU/72156/

聽聽聽聽聽聽聽聽 From Powershell: gwmi -class win32_Processor | SELECT CurrentClockSpeed, MaxClockSpeed