One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files “big enough” for Hadoop to process efficiently? The ProblemContinue reading “Hadoop Likes Big Files”
Tag Archives: Deploy
Azure Data Factory: Hub Not Found
You can use the new Azure portal to create or edit Azure Data Factory components. Once you are done you may automate the process of creating future Data Factory components from PowerShell. In that case you can use the JSON files you edited in the portal GUI as configuration files for the PowerShell cmdlets. ForContinue reading “Azure Data Factory: Hub Not Found”
Getting Started with Azure PowerShell Cmdlets–Subscription Management
I’ve started using the Azure PowerShell cmdlets more often to manage virtual machines and HDInsight in Azure. Once you connect to a subscription everything just works. However, the initial steps to get one or more subscriptions configured to be used from your machine or understanding how to change subscription information on your machine can beContinue reading “Getting Started with Azure PowerShell Cmdlets–Subscription Management”
Sample PowerShell Script: HDInsight Custom Create
This is a working script I use to create various HDInsight clusters. For a really reproducible, automated environment you would want to put this into a .ps1 script that accepts parameters (see here for an example). However, you may find the method below good for learning and experimenting. Replace all the “YOURxyz” sections with yourContinue reading “Sample PowerShell Script: HDInsight Custom Create”
Your First HDInsight Cluster–Step by Step
Small Bites of Big Data from AZURECAT Big Data Tech Training Series #1 Cindy Gross | Murshed Zaman Sometimes it is just hard to get started. Have you been putting off your first foray into Hadoop? Are you not sure where to begin? Let’s get really basic. Prerequisites: Azure subscription (free trials available) Install AzureContinue reading “Your First HDInsight Cluster–Step by Step”