Category: Small Bites of Big Data

Technology, Tech, Big Data, SQL Server, Azure, cloud

  • Hadoop Hive Error: Could not connect client socket, timed_out

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    With the Hadoop on Azure CTP, when you create a Hadoop cluster it expires after a few days to free up the resources for other CTP users. Therefore each time I do a demo or test I am likely to create a new Hadoop cluster. There are a few settings that it’s easy to forget about. For example, today I spun up a cluster and tried to use the Hive Pane add-in from Excel. I entered the connection information and hit the “OK” button.

    clip_image002

    Instead of seeing the expected option to choose the Hive table I saw this error:

    clip_image003

    Text version:

    Error connecting to Hive server. Details:

    SQL_ERROR Failed to connect to (cgross.cloudapp.net:10000): Could not connect client socket. Details: <Host: cgross.cloudapp.net Port: 10000> endpoint: 168.62.107.55, error: timed_out

    While there are very likely many possible reasons for this error, I’ve done this enough times to immediately realize I never opened the ODBC Server port on this particular cluster. I opened port 10000 on the Hadoop cluster via the “Open Ports” tile and tried again. Success! I can now query my Hive data!

    clip_image004

    I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

    Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly

    Other Small Bites of Big Data: http://blogs.msdn.com/b/cindygross/archive/tags/hadoop/

  • Load Data from the Azure DataMarket to Hadoop on Azure – Small Bites of Big Data

    Load Data from the Azure DataMarket to Hadoop on Azure

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    UPDATED Jun 2013: HadoopOnAzure CTP has been replaced by HDInsight Preview which has a different interface and different functionality. Getting Started with Windows Azure HDInsight Service http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/

     The first step in analyzing data is acquiring that data. You may have data available from logs, databases, or other internal sources. Or you may need to export data from an external source. In Hadoop on Azure we make it easy to load data from data sources such as the Azure DataMarket, Amazon S3, and the Windows Azure Blog Storage (ASV).

    Preparing Hadoop On Azure for a data load

    Find your data source

    • Find data on the Marketplace that meets your needs. For example, I looked for free demographic data.

         

    • Some of the data marked as free has limits on how much is available without payment. I subscribed to the “Gender Info 2007 – United Nations Statistics Division” because it was free, had a significant amount of data, had unlimited transactions/month, and is easily understandable.

     

    • You can click through on the data link for more information on the source, columns, and other details.

     

    • When you click on the “use” button you get a screen that lets you query the data. You will use information from this screen to load the data into Hadoop.

    • Leave this browser window open and stay logged in to the DataMarket.

    Load the Data

    • Log in to your cluster onHadoopOnAzure.com and click on the Manage Cluster tile.

    • Click on the DataMarket tile

    • Go to the DataMarket page for your data.
      • For the gender data, you need to change the “Query” value under “BUILD QUERY” from “DataSeries” to “Values”.
      • Next to “Primary Account Key” click on “Show”. This will expose a key value. Cut/paste that value into the “passkey” field of “Import from Data Market” on HadoopOnAzure.
      • Copy the “URL for current expressed query” into the “Query” field on HadoopOnAzure. Erase the end of the query ($top=100) that limits the results to the first 100 rows. For the gender data the query is https://api.datamarket.azure.com/Data.ashx/UnitedNations/GenderInfo2007/Values?$top=100, change that to https://api.datamarket.azure.com/Data.ashx/UnitedNations/GenderInfo2007/Values in the query field.
      • In HadoopOnAzure enter your DataMarket user name in the “User name” field and a unique new table name in the “Hive table name” field. If you the name of a table that already exists in this Hadoop cluster you will get an error.
      • Click on the “Import Data” button to start the data load.
      • You can now leave the screen and let the data load complete in the background.

     

    • You can check on the status of the data load by clicking on the “Job History” tile.

    • You can click through on the job to get more details.

    • Once the data load is complete, you can query the table from the Interactive Console.

    • It opens to the JavaScript console, you will need to click on the Hive button to get to the “Interactive Hive” screen.

    • Enter a SQL query in the white box at the bottom and then hit “Evaluate”. The HiveQL or HQL language is based on standard SQL and is closer to the MySQL implementation than to the SQL Server TSQL implementation. Most of the syntax is the same as TSQL but there are some differences. For example, execute: select * from GenderInfoValuesAll where year = 2000 order by countryname limit 200;. This looks like TSQL except you use limit instead of top.

    Now your data is loaded and you’ve done a basic test to verify the data looks correct. You’re ready to start using Hive on Hadoop!

    Other Microsoft Hadoop data loading links:

    I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

    Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly. Screenshots are from the May 2012 version of http://HadoopOnAzure.com.

  • 24HOP/SQLRally – Fitting Microsoft Hadoop Into Your Enterprise BI Strategy

    24HOP/SQLRally – Fitting Microsoft Hadoop

    Into Your Enterprise BI Strategy

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    The world of #bigdata and in particular #Hadoop is going mainstream. At 24HOP 2012 I talked about how a SQL Server professional fits into this big data world.

    Hadoop generally falls into the NOSQL realm. NOSQL uses basically available, soft-state, eventual consistency (BASE) instead of requiring immediate, 100% consistency via ACID properties. This allows certain types of queries/operations to be much faster since locking and consistency rules don’t have to be followed. This works well when any given program/query looks at the entire dataset and does not try to filter the data or query a subset of it. Basically with Hadoop you take data that may not have a known, consistent structure and impose a structure on it for the duration of your MapReduce program (query). This is all based on the concept of scaling out rather than up, with new commodity (low end enterprise server) hardware being added to an existing Hadoop cluster to meet dynamic demands. With Hadoop you are spreading hardware costs out over a longer time, possibly paying less overall for hardware, and shifting your IT costs to different departments and parts of the lifecycle.

    Hive is a database which sits on top of Hadoop’s HDFS (Hadoop Distributed File System). The data stays in HDFS but Hive stores metadata about the imposed structure and may store intermediate files. HiveQL looks a lot like TSQL and like most SQL languages makes an effort to align with the SQL standard when it makes sense. HiveQL will generate (usually multiple) MapReduce jobs to take an entire large result set and allow easier use of filters, joins, and aggregates. It’s still going to be slower than a relational database for most filters and joins but it allows access to data that may be too expensive (for whatever reason) to put into a relational or multi-dimensional database at this point in the lifecycle.

    You may keep your source data outside HDFS and bring it in only for the duration of a project. You can sqoop data to/from a relational database (there is a sqoop driver for SQL Server and PDW) or pull data from blob stores like Windows Azure and Amazon S3, use secure FTP, query OData sources like the Windows Azure DataMarket. The Hive ODBC driver, often via the Excel Hive Add-in, and sqoop can be used to pull data from Hadoop into other tools or systems.

    So far I’ve been talking as if big data = Hadoop. However, that’s not really true. Big data is data that is too big, expensive, or complex to process using your existing environment (often traditional RDBMSs with expensive SANs and high end hardware). While MapReduce, most commonly via Hadoop, is a very common solution it’s not the only one. There are streaming technologies like StreamInsight and HStreaming, machine learning like Mahout, massively parallel processing databases like PDW, and more. Generally big data at least starts out as unstructured or semi-structure, or perhaps of variable structure. It may flow very quickly with the need to process large amounts of data in a very small time window where decisions have to be made in real time.

    At its core Hadoop has the file system HDFS which sits on top of the Windows or Linux file system and allows data to be mapped over many nodes in a Hadoop cluster. A head node maintains metadata about where each piece of data resides. MapReduce programs “map” the data to the many nodes then reduce the output based on the required logic, resulting in an output file that is the end result of the program/query. Other pieces of the Hadoop ecosystem may include things like Hive, HBase, HCatalog, Lucene, Mahout, Zookeeper and more.

    So when would you use Hadoop? It’s good for exploring data, finding out what you don’t know. If you need to know your unknown unknowns or look for unexpected correlations, Hadoop may be what you need. It allows you to be very flexible and not lock yourself into a rigid schema until after you’ve fully explored your options. Some examples of common use cases are risk modeling, machine learning, cyber security, sentiment analysis, recommendation engines, log analysis, and sensor data. However, don’t think big data is the solution to all your problems. Often it is used to find and define the most useful bits of data or find the most useful algorithms. Then a more traditional design process may be used to create an RDBMS or multiple dimensional system for faster querying during day to day operations. Think of Hadoop as the foundation for a very specialized type of database that meets very specific needs, not as a replacement for relational. It’s important to note that every VLDB is not a fit for big data. “Big” is somewhat of a misnomer, size is only one of the characteristics and is relative to what your environment is prepared to handle.

    Often you ask about the 4 “Vs” when deciding whether to use Hadoop – volume, velocity, variety, variability. Think about big data when you describe a problem with terms like tame the chaos, reduce the complexity, explore, I don’t know what I don’t know, unknown unknowns, unstructured, changing quickly, too much for what my environment can handle now, or unused data.

    • Volume = More data than the current environment can handle with vertical scaling, need to make sure of data that it is currently too expensive to use
    • Velocity = Small decision window compared to data change rate, ask how quickly you need to analyze and how quickly data arrives
    • Variety = Many different formats that are expensive to integrate, probably from many data sources/feeds
    • Variability = Many possible interpretations of the data

    Microsoft is taking the existing Apache Hadoop code and making sure it runs on Windows. We have checked back in HDFS, MapReduce, and Hive code to the Apache open source community. More is on the way. We are adding things like the Excel add-in for the Hive ODBC driver, JavaScript, cloud (http://HadoopOnAzure.com), C# samples, etc. Where Microsoft is often seen as adding value is making Hadoop more available and allowing the reuse of existing skill sets. We offer self service, visualization, easier data movement, elasticity, and familiarity for faster completion of projects. This allows data scientists (people who know the business and statistics first and tech 2nd), data teams, information workers, and anyone seeking insights through BI to do their job better and faster.

    We offer visualization through PowerPivot, Power View, and the Excel Hive ODBC Add-in. For many used to Hadoop on Linux these things are new and exciting, even though they seem familiar to many in the SQL Server world.

    In the end, big data is really all about discovering insights and then taking action. If you don’t act on what you find, why did you spend all that effort mining the data? Sometimes the action is to go back and refine your process or change the way you looked at the data, but the end goal should be to make decisions that impact your business. Big data should add value to the business proposition.

    References

    My full presentation from #24HOP can be downloaded below. There are notes on many of the slides with further details. The recording from 24 Hours of PASS as soon as it is available at http://www.sqlpass.org/LearningCenter/SessionRecordings/24HoursSpring2012.aspx. Download their player then choose the session by title – “Fitting Microsoft Hadoop into your Enterprise BI
    Strategy
    (Cindy Gross)”.

    I updated the attachment to include the updated slides from my May 11, 2012 talk at #SQLRally. There are few changes other than the background.

    I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

    MSHadoopEntBIStrategy24HOPandSQLRally2012.zip

  • Using the Pi Estimator Sample on HadoopOnAzure CTP – Small Bites of Big Data

    Using the Pi Estimator Sample on HadoopOnAzure CTP

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    UPDATED Jun 2013: HadoopOnAzure CTP has been replaced with HDInsight Preview. See: How to Run the HDInsight Samples http://www.windowsazure.com/en-us/manage/services/hdinsight/howto-run-samples/

     Now that you have created your Hadoop on Azure cluster you can run the sample programs to become familiar with the interface. Click on the “Samples” tile under “Manage your account”.

    The gallery of available samples is growing rapidly.

    We’ll start with the very simple Pi Estimator sample. When you click on the Pi tile you’ll see some information about the sample (scroll down to see more). You can download the files and review them in as much detail as you want. You can review the PiEstimator.java file in notepad. The .jar file is a zip file that contains many other files. You can unzip it with many compression utilities then explore the files at your convenience. The description tells us the first parameter indicates how many maps to create (default of 16) and the second indicates how many samples per map (10 million by default). For now, we’ll take advantage of the Azure portal’s simplification of the complexities of Hadoop and click on “Deploy to your cluster”.

    This brings up a screen to create the Hadoop job. You can modify the job name and parameters if you like. The “final command”, in this case “Hadoop jar hadoop-examples-0.20.203.1-SNAPSHOT.jar pi 16 10000000”, can be copied to the command line if you choose to go execute the job at the command line later.

    Click on “Execute job”.  It may run for a minute or two, or perhaps longer if the CTP system is being heavily used. As it progresses entries are added to the “Output (stdout)” and “Debug Output (stderr)” sections. Eventually you will see “Status: Completed Successfully” under “Job Info”. I highlighted the runtime (“Job Finished in 66.123 seconds”) and output (“Estimated value of Pi is 3.14159155000000000000”).

     

    You can see that there were 16 maps created because that’s the first parameter we passed in to the jar command. If I change it to 8 maps and 1000 samples per map the command is now generated as “call hadoop.cmd jar hadoop-examples-0.20.203.1-SNAPSHOT.jar pi 8 1000” and the output has fewer significant digits even though the runtime is nearly the same:

    Job Finished in 57.061 seconds

    Estimated value of Pi is 3.14100000000000000000

    When you go back to the main portal screen you now see the “Pi Example” tile. If the job is currently running you will see “…in progress…” Once it finishes the tile will show “Complete”.

         

    To view job results, click on the “Job History” tile under “Manage your account”.

    If you click on any history row it will take you to the same output you see if you are looking at the job interactively.

    I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

    Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.

  • Open Ports for HadoopOnAzure CTP – Small Bites of Big Data

    Open Ports for HadoopOnAzure CTP

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    UPDATED Jun 2013: HadoopOnAzure CTP has been replaced by HDInsight Preview. See Troubleshooting ODBC connectivity to HDInsight http://social.msdn.microsoft.com/Forums/en-US/hdinsight/thread/b4ca52ea-f7cf-420c-959d-53e09f801f7d.      

    Once you have created your Hadoop on Azure cluster you will likely be moving data in and out of the system. That means you need to open the ports in Azure. By default the two ports exposed through the Metro interface are both locked. The error when you try to use an ODBC connection to the cluster when the ODBC Server port is closed will include the words “Could not connect client socket”.

    Click on the “Open Ports” tile to open the “Configure Ports” dialog.


    To do Hive or other ODBC queries, open the “ODBC Server” port 10000. If you plan to use FTP open the FTPS port 2226.


    Click on the “Windows Azure” tile at the top to go back to the main portal screen. If you need to open other, less commonly used ports you can use the Remote Desktop icon to connect directly to the VM and make the changes on the server.


    I will cover how to connect via Remote Desktop in a separate post. Once there use whatever firewall or other port software is installed to open the needed ports.

    I hope you’ve enjoyed this small bite of big data! Look for more big data blog posts soon.

    Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.

  • Creating a Cluster for HadoopOnAzure CTP – Small Bites of Big Data

    Creating a Cluster for HadoopOnAzure CTP

    Small Bites of Big Data

    Cindy Gross, SQLCAT PM

    UPDATED Jun 2013: HadoopOnAzure CTP has been replaced with HDInsight Preview which has a different interface: Getting Started with Windows Azure HDInsight Service http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/

    Are you ready to dive into this “Big Data” thing you keep hearing about? A good way to get started without having to understand and install all the HDFS, MapReduce, and other pieces yourself is to join the Hadoop CTP. It’s a very popular program so once you sign up you may have a short wait before you are given an account. Any cluster you create is time-bombed to free up unused resources for other eager CTP participants. This post will assume you’ve been granted a CTP account to use on https://www.hadooponazure.com/.

    1)      Sign in to your Hadoop on Azure account. Go to https://www.hadooponazure.com/ and click on the “Sign in” button.

     

    2)      If you don’t currently have a cluster allocated, you will be taken to a screen where you can request one.  Note that the saying at the top of the screen will change randomly each time you navigate to a screen.

    • Choose a unique name for your cluster and it will append .cloudapp.net.  My cluster is called cgross.cloudapp.net.
    • If you are just playing around be respectful of the fact that this is a very popular CTP and choose the “Small” cluster size (currently 4 nodes, 2 TB disk space).
    • Choose a Cluster login and password. I use a different login than my cluster name, cgross1.
    • You can choose to use SQL Azure for your Hive Metastore but I’m going to skip that for the sake of simplicity.
    • Choose “Request cluster”

     

    3)      It will take several minutes for your cluster VMs to be created and allocated.

     4)      After a few minutes you will see a message that your cluster is ready for use. In the upper left you can see how long you have before the cluster expires. When you reach about 6 hours left you can choose the “Extend now” button to keep your cluster longer. You can also choose to “Release cluster” when you are finished to free up those resources for other CTP participants.

       

    Explore the various tiles and what you can do in each. You may want to start with the “Downloads” tile and go through the “How-To and FAQ”. If you are idle for a while you will be logged out of the site, but this as long as the cluster has not expired it will all still be there once you log back in. The “Job History” stays around across cluster builds, it is not cleared out when a cluster is released or expired.

    I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

    Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.

  • SQL Server 2012 RC0 available

    Release Candidate 0 (RC0) of SQL Server 2012 is available for download! That means you can start to experience a feature complete version of the next great SQL release. 🙂 We’re still looking at the first half of calendar year 2012 for the RTM release of SQL Server 2012.

    FURTHER.  FORWARD.  FASTER.

    Downloads:

    Greater availability. Deliver the required 9s and data protection with AlwaysOn, delivering added functionality over CTP3 that allows customers to experience multiple, readable secondaries for distributed scale of reporting and backup workloads and support for FileTable and FILESTREAM which brings first-class HA to complex data types.

    Blazing-fast performance. More effectively govern performance in multi-tenancy environments like private cloud. Resource Governor enhancements include support for 64 resource pools, greater CPU usage control, and resource pool affinity for partitioning of physical resources and predictable resource allocation.

    Rapid data exploration. Empower end users with new insights through rapid data exploration and visualization. Discover new insights at the speed of thought with more feature/functionality in Power View, the highly interactive, familiar browser-based data exploration, visualization, and presentation experience for end users.

    Credible, consistent data. In addition to CTP3 functionality delivered for Data Quality Services and Master Data Services, customers can better support heterogeneous data within Data Warehouses through new Change Data Capture (CDC) support for Oracle.

    Optimized productivity. Optimize IT and developer productivity across server and cloud with the new SQL Server Data Tools, a unified development experience for database and business intelligence projects, and cloud-ready capabilities within SQL Server Management Studio for snapshot backups to the Windows Azure Platform. Additionally, SQL Server 2012 offers a new version of Express – LocalDB. SQL Express LocalDB is a lightweight version of Express with all its programmability features, yet runs in user mode with a fast, zero-configuration installation and short list of pre-requisites. For more information and to try it out, go here.

     Licensing:

     SQL Server 2012 Wallpaper/backgrounds:

     More about SQL Server 2012:

     If you don’t already use it, now might be a good time to run the Microsoft Assessment and Planning (MAP) Toolkit. If you load the sample MAP database and go to File.Prepare New Reports and Proposals you can see what typical SQL inventory reports look like. The output includes version, service pack level, edition, whether it’s clustered, # of procs and cores, and system memory. You would need to have a SQL database to store your own inventory results.

     Microsoft Assessment and Planning (MAP) Toolkit http://technet.microsoft.com/en-us/solutionaccelerators/dd537566.aspx

     Data Sheet:

    MAP provides you with actionable information from the start. In a matter of hours, MAP helps provide you with answers to key questions like:

    • Which of my PCs can run Windows 7, Microsoft® Office 2010, and Office 365?
    • Are my web application portfolios and databases ready to migrate to the Windows Azure platform? What is the TCO-ROI?
    • What next steps should I take to plan a private cloud?
    • Which of my servers are capable of migrating to Windows Server 2008 R2, or can be virtualized using Hyper-V?
    • Where can I find Microsoft® SQL Server® 2000, MySQL, Oracle, and Sybase instances for migration to Microsoft® SQL Server® 2008 R2?
    • Is our software usage in compliance with software license agreements?

  • SQL PASS: All the Magic Knobs – Tools

    SQL PASS: All the Magic Knobs – Tools

    In my All the Magic Knobs talk at #SQLPASS 2011 I discussed some easy ways to determine if you’re using some of the performance magic for SQL Server. When you have many consolidated, non-tier 1 databases you don’t have a lot of control over, the best way to tune is to provide a solid, performant infrastructure through low effort, high impact choices. The same steps help in your tier 1 environments as well. The quickest way to see how close you are to that standard is to run one of our automated health checks. They check the SQL instance itself and some of the most important Windows settings that help SQL Server operate optimally.

    SQL Best Practices Analyzer (BPA) is available for SQL Server 2000, 2005, and 2008/2008 R2. It is an add-in to the Microsoft Baseline Configuration Analyzer (MBCA). Both the MBCA and the SQL BPA are free. You can run the BPA locally or remotely and you can find plenty of sample scripts to run it against multiple instances. You choose your schedule for execution and you can either review the output after each execution manually or write your own program to alert you to what you consider the most serious items.

    The System Center Advisor (SCA) is at this point still in pre-release. Licensing details will be available after release, for now you can download a free trial. It works for SQL Server 2008 and newer on Windows 2008 and newer. SCA runs on a schedule and sends alerts when a registered instance is not configured as advised. What it checks can change dynamically as PSS finds new important items.

    Several companies, including Microsoft through our Premier Field Engineering (PFE) team, offer various health checks that include knowledge sharing and additional advice to help you decide if, how, when, and where to implement the recommendations.

    Of course, you have to actually implement the recommendations to get the benefit; the tools listed above don’t do any remediation on their own. While that should go without saying, in my experience known recommendations often go unimplemented until some problem they would have prevented pops up.

    For more SQL Server best practices see some of my other blogs: http://blogs.msdn.com/b/cindygross/archive/tags/best+practices/  

     

  • SQLPASS: Are You Smarter Than an MCM? VLF Demos

    Are You Smarter Than an MCM?

     I had a great time on Wednesday co-presenting with some great SQL peeps! We dressed up, had a lot of fun, and shared a few technical tips along the way. My demo was on VLFs, an often forgotten and/or misunderstood part of the transaction log technology. The demo files are attached below. Thanks to everyone who came to the talk! #SQLPASS

    The Team for DBA-414-M

    Our fabulous “Smarter” team is

    VLFs

    Question: Cindy has ADHD.  In the past she we able to keep up with multiple tasks at once, but lately she has slowed down.  Why? 

    Answer: Excessive VLFs

     A VLF is the boundary within your log file(s) for log operations such as scans (replication, recovery, mirroring, CDC, etc.) and log clearing/truncating.

    • 100s of VLFs might be ok, 1000s is probably bad, 10s of thousands is almost certainly bad. Bad means much slower operations for activities that operate within VLFs boundaries.
    • DBCC LOGINFO is undocumented and unsupported but is the only way to see your VLF distribution. It returns one row per VLF.
    • Often you get too many VLFs from autogrow/small growths.

     VLF Lessons Learned:

    • Avoid autogrow by pre-sizing your data and log files appropriately. This includes alerting when your free space starts to decrease significantly and planning for data growth over the lifecycle of the system. Make sure you check your system as well as your user databases.
    • Set autogrow values to reasonably large amounts to avoid a series of small autogrows in an unexpectedly high volume period, but not so big the growths themselves slow down the system.
    • Alert on autogrows and check (and possibly correct) VLFs afterwards.
    • Check your VLFs periodically and during performance troubleshooting and consider shrinking/regrowing properly (with the proper presizing) to reduce the total number of VLFs when the number gets “too high”.
    • Never grow (manually or via autogrow settings) in increments of exactly 4GB. Values such as 4000MB or 8000MB are good, but there is a bug with multiples of exactly 4GB.
    • Do NOT shrink your data or log files on a regular basis. Shrinking should be extremely rare and in response to a specific issue.
    • And because it can’t be said too often, pre-size your databases to avoid autogrow.
    • Revisit the sizes periodically.

     Thanks everyone for coming to the talk and for the great feedback afterwards! You did fill out your evaluation forms, right? 🙂

    Pictures:

    http://yfrog.com/z/h88chpvj

    http://flic.kr/p/awbV6h

    VLFDemo.zip

  • SQL PASS: All the Magic Knobs

    SQL PASS 2011 DBA-319-C #SQLPASS

    All the Magic Knobs – Low Effort, High Return Tuning

    Key points covered:

    • Power Savings = High Performance
    • Smart Virtualization
    • Enough Hardware
    • Control other apps, filter drivers
    • Optimize for ad hoc workloads = ON
    • Compression = ON
    • Set LPIM + Max Server Memory
    • Pre-size files, avoid shrink and autogrow
    • Fast Tempdb
    • Proper Maintenance

    My presentation from 10/13/11 is attached.

    AllTheMagicKnobs.pptx