Hadoop Likes Big Files

One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files “big enough” for Hadoop to process efficiently? The ProblemContinue reading “Hadoop Likes Big Files”

Create HDInsight Cluster in Azure Portal

Creating an HDInsight cluster from the Azure portal is very easy. However, sometimes you want all the choices and best practices explained as well as the “how to”. I have created a series of slides with audio recordings to walk you through the process and choices. They are available as sessions 1-8 of “Create HDInsightContinue reading “Create HDInsight Cluster in Azure Portal”

PowerShell works for Amazon AWS S3 too!

Technorati Tags: azure blob store,azure storage,aws,s3,neal analytics,windows azure,powershell More and more we have to work with data in many different locations. This week I got to work with S3 files that were moving to Azure blob storage. I was surprised to find that Amazon has published AWS cmdlets for PowerShell. It took me a littleContinue reading “PowerShell works for Amazon AWS S3 too!”

Why WASB Makes Hadoop on Azure So Very Cool

Data. It’s all about the data. We want to make more data driven decisions. We want to keep more data so we can make better decisions. We want that data stored cheaply, easily accessible, and quickly ingested. Hadoop promises to help with all those things. However, when you deal with Hadoop on-premises you have aContinue reading “Why WASB Makes Hadoop on Azure So Very Cool”

Azure Maximums and Resource Usage from PowerShell

Technorati Tags: Azure,PowerShell Have you ever struggled to find out how many VM cores, HDInsight cores, storage accounts, or other Azure resources your subscription is set to allow or how many you actually use? Maybe you want to use this information in your automation scripts to avoid trying to create components for which you don’tContinue reading “Azure Maximums and Resource Usage from PowerShell”

Use Additional Storage Accounts with HDInsight Hive

When you create an HDInsight Hadoop cluster you pass in one or more storage accounts and their associated keys. This allows you to access the files on all associated storage accounts from the cluster. If you want to use public storage that isn’t passed in at create time that’s easy – simply supply the storageContinue reading “Use Additional Storage Accounts with HDInsight Hive”

Getting Started with Azure PowerShell Cmdlets–Subscription Management

I’ve started using the Azure PowerShell cmdlets more often to manage virtual machines and HDInsight in Azure. Once you connect to a subscription everything just works. However, the initial steps to get one or more subscriptions configured to be used from your machine or understanding how to change subscription information on your machine can beContinue reading “Getting Started with Azure PowerShell Cmdlets–Subscription Management”

Sample PowerShell Script: HDInsight Custom Create

This is a working script I use to create various HDInsight clusters. For a really reproducible, automated environment you would want to put this into a .ps1 script that accepts parameters (see here for an example). However, you may find the method below good for learning and experimenting. Replace all the “YOURxyz” sections with yourContinue reading “Sample PowerShell Script: HDInsight Custom Create”

Your First HDInsight Cluster–Step by Step

Small Bites of Big Data from AZURECAT Big Data Tech Training Series #1 Cindy Gross | Murshed Zaman Sometimes it is just hard to get started. Have you been putting off your first foray into Hadoop? Are you not sure where to begin? Let’s get really basic. Prerequisites: Azure subscription (free trials available) Install AzureContinue reading “Your First HDInsight Cluster–Step by Step”

PowerShell for Azure cmdlets: Subscription was all Wacky

I was working on some HDInsight scripts in PowerShell and doing lots of experimenting. I’m not sure what exactly I did but all of a sudden everything stopped working. With lots of interruptions from meetings and chats and lunch…. I couldn’t retrace my steps. Everything seemed to fail on the Azure subscription information so IContinue reading “PowerShell for Azure cmdlets: Subscription was all Wacky”

%d bloggers like this: