One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files “big enough” for Hadoop to process efficiently? The ProblemContinue reading “Hadoop Likes Big Files”
See my new blog post (written with Denzil Ribeiro) about “How and Why to Enable Instant File Initialization” on our PFE blog. Keep an eye on the PFE blog for more posts from my team in the near future.