Hadoop Likes Big Files

One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files “big enough” for Hadoop to process efficiently? The ProblemContinue reading “Hadoop Likes Big Files”

Understanding WASB and Hadoop Storage in Azure

Yesterday we learned Why WASB Makes Hadoop on Azure So Very Cool. Now let’s dive deeper into Windows Azure storage and WASB. I’ll answer some of the common questions I get when people first try to understand how WASB is the same as and different from HDFS. What is HDFS? The Hadoop Distributed File SystemContinue reading “Understanding WASB and Hadoop Storage in Azure”

Why WASB Makes Hadoop on Azure So Very Cool

Data. It’s all about the data. We want to make more data driven decisions. We want to keep more data so we can make better decisions. We want that data stored cheaply, easily accessible, and quickly ingested. Hadoop promises to help with all those things. However, when you deal with Hadoop on-premises you have aContinue reading “Why WASB Makes Hadoop on Azure So Very Cool”

Windows storport enhancement to help troubleshoot IO issues

For Windows 2008 and Windows 2008 R2 you can download a Windows storport enhancement (packaged as a hotfix). This enhancement can lead to faster root cause analysis for slow IO issues. Once you apply this Windows hotfix you can use Event Tracing for Windows (ETW) via perfmon or xperf to capture more detailed IO informationContinue reading “Windows storport enhancement to help troubleshoot IO issues”

What do those “IO requests taking longer than 15 seconds” messages on my SQL box mean?

You may be sometimes seeing stuck/stalled IO messages on one or more of your SQL Server boxes. This is something it is important to understand so I am providing some background information on it.   Here is the message you may see in the SQL error log: SQL Server has encountered xxx occurrence(s) of IOContinue reading “What do those “IO requests taking longer than 15 seconds” messages on my SQL box mean?”

Compilation of SQL Server TempDB IO Best Practices

It is important to optimize TempDB for good performance. In particular, I am focusing on how to allocate files.   TempDB is a unique database in several ways. The ones most relevant to this discussion are: ·         It is often one of the busiest databases on an instance. This means the performance of TempDB isContinue reading “Compilation of SQL Server TempDB IO Best Practices”

SQL Server with NetApp SAN

If you are planning to use  NetApp as the SAN for your SQL Server instance(s), take a look at these documents in addition to the normal SQL Server IO planning best practices documents. TR-3779 Sizing best practice guide. http://media.netapp.com/documents/tr-3779.pdf TR-3696 This is for the storage layout best practices. http://www.netapp.com/us/library/technical-reports/tr-3696.html White Paper on 1 TB DSS systemsContinue reading “SQL Server with NetApp SAN”

%d bloggers like this: