Befriending Dragons

Transform Tech with Anti-bullying Cultures


2 Comments

Reclaiming My Voice, Becoming Me – A Befriending Dragons Story

In the mornings I search my closet to find the perfect outfit that shows I’m technical enough, I’m smart enough, I fit in. One day I wear a skirt, and my manager can’t seem to process it, I have gone outside his expectations. He flips to the script of “this is a woman, not a skilled computer expert who knows more about her area of expertise than almost anyone else in the world.” He can’t hold “woman” and “good at technical stuff” in his brain at the same time. His obviously confused comments about how nice I look deflate me all day, and that’s the last time I wear a skirt to work. Another day my manager comments that it’s great that I’ve brought some women candidates to interview for the open position on our team, but that we won’t lower the bar for them. As if the bar isn’t already shaped like a white man, with false proxies that exclude so many qualified people. When he finds out a coworker made disgusting, sexually explicit comments to me I overhear him say to a coworker that the bully can’t be blamed since he thought I had picked up the coworker I was walking with in a nearby bar. I move to a team that rarely interacts with customers, a team where I have no chance of encountering my old teammates. They have much more of an casual clothing vibe. When I wear my existing wardrobe, clothes from boutique stores, I get puzzled comments asking where am I going after work, is it someplace fancy? One of the few other women on the team comes over to tell me she’s glad I continue to dress up because it makes her feel less out of place when she wears similar clothes. What I think is “Why can’t I ever get it, ME, right?” Why can’t I fit in? Why are my dragon scales always too shiny or not shiny enough? Why is my roar always silenced? When did I lose my voice, when did I start spending so much emotional energy to walk a fine line between likable and competent? When did I give in to the bullies?

It was years before I realized my path to belonging & success became very narrow when I reported the worst bully, a sexual harasser, to my manager. The manager talked me out of reporting to HR. He encouraged me to silence myself, to keep quiet, all to keep my chance at a promotion. What I heard is that my voice doesn’t matter, or even worse my voice is destructive to my career. I didn’t realize then that my chance at a promotion was gone the moment I spoke up about the harasser, the bully, even in the privacy of one sympathetic person’s office. It was clear the so-called “brilliant jerk” who harassed me was just too valuable to the team, and I was not valuable enough. Instead of insisting I was as valuable, actually even more valuable because I wasn’t a bully, I took my dragon roar and internalized it as a silent scream. I looked around at the sea of men I worked with and saw what others thought tech looked like – not me. I tried to muffle my inner shrieks and focus on creating success by changing myself, ignoring my own brilliance and aptitude for the job. I just knew if I could make myself even more “one of the guys,” if I could suppress the “bad” aspects of my femininity, I could “win.” I could manage my way out of this by tightly controlling everything – myself, my voice, my manager, my coworkers, how much of myself I shared with my boyfriend. Only I couldn’t. I didn’t. The more I silenced myself, the more I changed my dragon roar into a silent scream, the more I lost myself.

I imagined myself as a solitary dragon, alone in my safe cave. As long as I kept people at a distance I could survive. I never even considered that I deserved to thrive instead. I nursed my internal wounds, mostly by minimizing and ignoring them. I imagined my loneliness as peaceful solitude. I hoarded my energy, my thoughts, my feelings. I told myself I was in control, I was exercising my power. I shape-shifted into a shadow of myself, a caricature. I closed all the gates around me to keep the bad things away, ignoring that I also kept the good things away. Like so many trauma victims, I internalized the bully’s actions as partially my own fault. I thought I could, must, change myself to avoid future bullies.

But that’s not the way the world works. Instead of looking at myself as a scary dragon, I can choose to see myself as a free agent in the world, a friendly dragon who can fly where I want, when I want, how I want. I know there’s the reality of the white patriarchy, a system that builds success bars shaped like a narrow subset of cishet white men. Because of my own privileges as a white-presenting woman with one parent who graduated from college, because of the perseverance and grit and pure luck that let me slide through the edges of the white patriarchy and accumulate some wealth, I have the freedom to put myself in another part of the world, a part where I can thrive. My dragon scales are just fine the way they are, and I choose how much, what kind of, light to shine on them, on myself. I can befriend this bully culture dragon, I can stop internalizing it and stop trying to fix myself. I can make my experience a friendly dragon. That jerk who bullied me, who was found guilty of sexually harassing me when I finally reported him to HR, doesn’t define me. I don’t have to change myself, I don’t have to become invisible and silent to people like him and the people who excuse his behavior. He behaved very badly, he committed verbal violence. The system at work tried to find the balance of action so that neither of us would sue or speak out too much. But they misjudged. I did speak out. I reclaimed my voice. I started to speak out about my experience, at first quietly in small groups. Then from a stage. Then directly to my new team. Then loudly for the world to hear. I left Microsoft without signing their confidentiality agreement, without letting them steal my voice once again in return for a few month’s pay.

I reclaimed my roar and ended my internal screaming. I befriended that dragon. I reclaimed my voice and my feminine side. I belong because I decide what that means for me. I choose to step away from patriarchy, the quest for perfection, whiteness, hierarchy, and conformity. I choose thrival, self-care, and relationship-based work.

I see myself as a beautiful, free, contradictory, powerful, wise, and confident dragon with a loud roar. I am ready to take on the world, to speak truth to power. I create my own path. I journey with women, we reclaim our voices, we move on to new, bigger lives after a bully tries to make us small. We nurture new paths, new cultures, new open gates where we can be ourselves, create success, and generate a sense of belonging in our cultures. We ROAR!


Leave a comment

The Dragons of November – Newsletter #2020.11

Happy November to all my fabulous Dragon Friends. Welcome to my introductory Befriending Dragons newsletter!

Nov 3, 2020 Meetup

Money Un-Tabooed Podcast – Financial Impact of Harassment

Progressive Voters Guide

Word cloud: Befriending Dragons nurture culture journey heal voice reclaim strength pivot leader moment grace leadership story feminine leader create


1 Comment

Remotely Biased – A Befriending Dragons Story

Humaaans generated image of a dark skinned woman in a blue skirt and long sleeve top facing left and walking quickly with arms outstretched.
Embrace anti-bias in remote work

Last week @VeniKunche tweeted asking for “remote work” tips for managers. I immediately replied with a whole string of tips that reduce bias. Veni said, hey, blog that. And I thought, sure, that’s easy. And yet here it is, days later, and I hadn’t written much more than a paragraph until I accepted Amy Cuddy’s invite to Quarantine Writing Hour. I can literally feel the anxiety sitting in my chest, aching. Folks, this is what it’s like to work during a crisis, personal or global. It’s not because I’m remote, it’s not because no one is watching over my shoulder with an eye to punish lowered productivity. It’s because we’re stressed, we’re worried about family and friends and the future of the world, we are fidgety, we miss our community, we are overwhelmed. Luckily I’m not feeling sick, but many are and without sufficient testing we don’t know who actually has COVID-19.

Some of us have done remote work for a while, some are completely new to the experience. As the need to maintain “social distance” grows with the spread of COVID-19 there are fountains of advice on the practical aspects of how to work remotely. But what about the social justice and leadership aspects? How do we keep bias and bullying from creeping into every aspect of working remotely? How does this impact various folks differently? How do we take advantage of this social disruption to drive positive changes into our workplace, changes that could linger long after the novel Coronavirus is under control?

The reasons it took me so long to write this story are the same reasons we can’t expect high productivity out of people working from home right now. It’s not the working from home part. It’s the stress of working in an unfamiliar environment, underprepared, while we’re worried about everything. Many folks have unfamiliar, inadequate equipment in a home where they may also be caregivers for other stressed out folks. There may not be enough devices, internet bandwidth, or “included” data for everyone to work and learn at once. We may not have physical or emotional safety.

Kindness

“You can be rich in spirit, kindness, love and all those things that you can’t put a dollar sign on.” — Dolly Parton

Change causes stress. Even when we’re able to use stress to push us forward, it can still negatively impact our lives. So prioritize kindness over niceness and politeness.

Center the folks most marginalized on your team, and do all you can to uplift them even if means making other folks uncomfortable when you point out bias. Don’t tolerate COVID-19 jokes, insensitive comments that trivialize the danger to the most marginalized, or point blame at Asian people. Practice now how you will reply to anyone making ableist, racist, or sexist comments.

Where’s the bias?

“The defining question is whether the discrimination is creating equity or inequity. If discrimination is creating equity, then it is antiracist. If discrimination is creating inequity, then it is racist.” — Ibram X. Kendi, How to Be an Antiracist

Well, women and people of color are much more likely to be caregiving than white men are, and that takes time and energy. We’re crowded into unfamiliar situations where we have to navigate all sorts of family dynamics that we’re not used to, and typically that will fall mostly to one person, using up their already limited energy. As somebody living alone with my cats, I’m also going through this chaos because I’m fielding calls and messages from friends and family with problems they need help with, things I may or may not be able to help them with. I get really stressed when I can’t help people who need me! I’m constantly bombarded with news snippets and feeling compelled to dig deeper, because my curiosity is always in the forefront of my actions and there’s so much new, vital, literally life or death information ALL THE TIME. That makes us less productive – don’t penalize that right now!

Women, especially BIPOC, are more likely to be cooped up for days on end with an abuser, to have lower savings (hey, pay gap!), to be expected to deal with everyone else’s stress, to rely on a community that is now less available, and all those other inequities we’ve been talking about and doing so little to actual address.

When we’re stressed or short on time we fall back on deeply embedded patterns, and that means we rely more on stereotypes and bias. We have to be very intentional to pay attention to this and compensate for the bias that will ALWAYS creep in.

The Tweets aka the Advice

I’m going to make this ultra-simple on myself, I’m going to paste below my replies to Veni’s tweet. I welcome comments and questions.

Cindy Gross (she/her) #BefriendingDragons@CindyGross This is for university professors but it could be adapted to workplaces. Be flexible, lower expectations (folks are scared, sick, overwhelmed, facing change), put family and health 1st, things will get messed up – expect it and don’t punish it, be kind. https://anygoodthing.com/2020/03/12/please-do-a-bad-job-of-putting-your-courses-online/

Some of your employees are going to spend a whole lot of time in enforced close proximity to their abuser. Some are the abuser, perhaps triggered by stress and frustration. Be kind.

Not everyone has enough bandwidth, may face a datacap. They may not have great, fast devices at home, may have to share one pc. They may have many folks in the house streaming classes, meetings, large files. Keep your emails and optional files simple & small.

Folks react differently to isolation. Offer but don’t force virtual coffees, open “water cooler” zoom calls where people can come & go, gracious space questions for folks to reflect on how they are creating success in chaos with a focus on finding the ways they are doing great.

Put on your anti-bias hat. Don’t over-reward the folks who over deliver during this time. They will be disproportionately white men because that’s how our white patriarchy is set up. Statistically men have more flexible schedules & fewer child/elder care duties.

All sorts of biases will be exaggerated as everyone is under pressure, managers have to be extra careful to be great allies. Ppl who aren’t white may not always code switch at home the way they do at work. You may see more of their authentic self – reward this, don’t punish it.

Remember at review/reward/promo time that this virus has disrupted the year. Highlight & reward folks who build strong relationships, strong containers, strong stakeholder outreach. A lot of “soft skills” that ppl who aren’t white men have to develop to survive can be showcased.

Change is everywhere right now, fill the cracks with anti-bias. This is hard work, but may actually be easier since disruption is already on full swing. Rebuild with anti-bullying and anti-bias.

Managers, now is the time to bring in folks like Veni or me or any of the myriad of anti-bias, pro-belonging, pro-DEI folks to take hold of this disruption in work life and come out the other side stronger. #BefriendingDragons

And some tweets from other threads

Summary

Be kind. Center the most marginalized over the most powerful. Be anti-bullying, anti-harassment, anti-racist, & anti-sexist.

Going forward, allow more folks to work from home regularly without penalty. This disproportionately helps folks with disabilities and those who are caregivers. It builds trust and refocuses everyone on the work. It’s good business, good for your employees, and good for the environment.

When I was a boy and I would see scary things in the news, my mother would say to me, "Look for the helpers. You will always find people who are helping." To this day, especially in times of "disaster," I remember my mother's words and I am always comforted by realizing that there are still so many helpers - so many caring people in this world. - Fred Rogers
Look for the helpers – Fred Rogers

Pledge to really work hard to address the bias head-on in your next round of reviews and/or rewards. Don’t reward productivity in and of itself. Reward those who help others through this, who build and nurture relationships, who reduce other people’s stress and tension. Those people are the true leaders.

Want receipts on these bias factors? Search on terms like:

Check out my Befriending Dragons reading list if you want to dig deeper.

Be kind, lean into checking your biases, and reflect on how to thrive during this stressful time.


Leave a comment

Befriending Dragons Happy Hour

I have so many thoughts and ideas about where my passion will lead me next. I haven’t yet settled on any one thing for a new career, so I went back to the basics. Listen. Listen to my community. I envision my community as marginalized people in tech. So I have started a meetup group where we can get together and talk. Where we can listen to each other. Where we help each other. Join me and let’s go on this journey to our futures together.

#DatesWithDragons in the snow

A gathering place for people forging new paths after harassment at work.

This is a safe space – no hate speech, bullying, harassment, or discrimination is tolerated. We value input from a variety of identities and will center the views, needs, and decisions of those who are not cishet white men.

I’m a 50 year old white woman leaving the tech world. As I talk about the harassment, bullying, and discrimination I’ve faced over the years other women open up about their own experiences. So many of us have no place to talk to others with the same experiences. Let’s share our stories, our growth, our pain and joy. This is a place to talk about surviving and thriving, about careers, family, friends, life, work, play, and about disrupting the white patriarchy to nurture a new way of doing things.

#Words4Justice

Befriending Dragons – Life After Workplace Harassment

Bellevue, WA
3 Members

A gathering place for people forging new paths after harassment at work.This is a safe space – no hate speech, bullying, harassment, or discrimination is tolerated. We value …

Next Meetup

Befriending Dragons Happy Hour

Sunday, Feb 10, 2019, 3:00 PM
1 Attending

Check out this Meetup Group →


1 Comment

Befriending Dragons | #Words4Justice

Today is my last official day at Microsoft.

I no longer feel safe, comfortable, or valued working in tech. Going forward I’ll be working to actively disrupt tech culture and systems to reduce harassment and discrimination. Keep an eye on #Words4Justice. 😊

Be kind. Be brave. Go beyond ally to accomplice to actively disrupt bullying and discrimination.

cindygross@outlook.com 
@cindygross | @SQLCindy #Words4Justice
https://befriendingdragons.com/

My experiences

Shared Experiences Meetup


1 Comment

Windows Hyper-V Dragon

After all these years soaring through the data world, from SQL Server 1.11 all the way through today’s modern Big Data technologies, I am making a flight adjustment. My next adventure will be in the land of the Windows Hypervisor: Hyper-V. Last week I started working with my new team and I am already learning to corral and wieldGreenFlyingDragon a whole new world of acronyms, technologies, and scenarios. As a software engineer on the quality team I’ll help define and implement test scenarios that lead to better customer experiences across multiple products.

I won’t be leaving data behind! This new role has a lot of data aspects and of course the hypervisor underlies many of the world’s data systems! It’s been great working with the #SQLFamily over the years and I look forward to continuing to work with you all!


Leave a comment

Moving Beyond Unconscious Bias – Good People Matter!

Presented at SQL Saturday Oregon on October 24, 2015

by Julie Koesmarno and Cindy Gross

Good People

We’re good people. As good people we don’t want to think we do things that have negative consequences for others. But sometimes our subconscious can fool us. What we intend isn’t always what happens. We think we’re making a totally rational decision based on our conscious values – but subtle, unconscious bias creeps in. Yes, even for good people. For 20+ years folks at Harvard have been using something called the Implicit Association Test (IAT) to help us identify our biases.

Take this IAT on gender and career – the results may surprise you: https://implicit.harvard.edu/implicit/user/agg/blindspot/tablet.htm

Watch Alan Alda take the test, it will give you a feel for how it works: https://www.youtube.com/watch?v=2RSVz6VEybk

image

Patterns and Categories

The human brain works with patterns and categories. It’s how we make it through the day. We are bombarded with 100s of thousands of data points every day – we can’t possibly think through each one every time. We unconsciously assign data points, including our perception of people, into buckets. Those buckets have values and characteristics assigned to them that may or may not reflect the individual person we put in that bucket.

This automatic assignment is called intuitive thinking or system 1 thinking. It’s easy and takes little effort. It serves us well and lets us take on many tasks every day. However, it also sometimes leads us down the path of thinking we’ve chosen the “best” person when we’re really hired someone who meets some set of assumptions.

Sometimes we use slow thinking, or system 2 thinking. It’s rarely a conscious decision, something just makes us take some extra time and we usually don’t even realize it. That’s when we stop to question what we’re doing – maybe we adjust which categories we put someone in or we adjust the category or the values and judgments associated with it. We’re good people but system 2 thinking is tiring and we just can’t do it all the time.

image

Diversity Matters

Why does diversity matter at work? Personally, when we’re on a diverse team we tend to have higher personal and job satisfaction. Diverse teams are interesting and we often learn more. People who don’t feel like they’re the “only one” of something (gender, sexual orientation, race, introvert/extravert, etc.) relax, contribute more, and are more productive. And study after study shows that more diverse teams lead to better products and a better bottom line.

Companies with women on their boards have higher ROIs, more diverse companies tend to perform above average, and let’s face it – we don’t have enough STEM graduates to fill needed jobs if we don’t encourage a more diverse group of people to enter the field.

imageimage

Mind Tricks

But we’re good people and we don’t make these snap judgments. We are rational and we always know why we made a decision. Or do we?

image

Optical illusions fool us all the time. Even knowing those lines are all the same length, did you have to measure them just to be sure? The same thing happens in our interactions with people. What’s the first thing that comes to mind for single parent, introvert, doctor, CEO, or programmer? That first thing hints at your categories – the categories built up by a lifetime of media saturation filled with type-cast actors.

image

Back to the science of bias. Let’s think about resumes. In one study, resumes were handed out to academics who were asked to rate the job candidates for competency, hireability, mentoring, and suggested salary. Some resumes were for John and some for Jennifer. Professors of all genders rated Jennifer 25% less competent and less likely to be hired. They rated John worth about $4000 more. When asked why they gave ratings their justifications sounded rational but…. 4 industry publications was awesome for John and 4 was just not enough for Jennifer. They are good people but they (we!) are at the mercy of their subconscious and years of societal conditioning.

Moving On

We’re good people so what do we do?

Take the IATs – there are many, take at least a couple and understand your unexpected biases. Talk about this with others so we all become comfortable talking about our subtle biases. Work to consciously update your mental categories – seek out images and reminders of people who are different and successful. Now that you know your own categories a bit better, be more mindful about switching to system 2 thinking. Reach out to one person and mentor them. Spend time with someone who makes you uncomfortable. Pay attention to the “firsts” (the first autistic character on Sesame Street, the first black President, the first whatever) and see if that helps you update your mental categories.

Increase the pipeline. Participate in groups that help kids learn to code. Recruit beyond your normal network, post jobs on diversity sites, and consider non-traditional backgrounds. Join diverse groups that don’t match your own diversity.

Be careful with words. Is someone bossy or exhibiting leadership? Is someone aggressive or a go-getter? Are they emotional or passionate. You may be surprised how you assign different words for the same behavior in unexpected ways.

When you post a job, only list something as “required” if it truly is. Women for example tend to only apply if they meet almost all the requirements, men tend to apply if they meet a few. Do you really require Java experience or do you need a good coder who is willing to learn new things? Don’t ask for a specific type of leader, look for someone who can lead in any of many productive ways. Explicitly state that you value a diverse team. And beware of subtle stereotypes – words like best, rock star, action-oriented define a particular picture but may not represent what you’re really looking for.

When reviewing resumes, have HR take off names, cities, and years. Before you pick up a resume decide on your priorities – does experience or willingness to learn matter more for example? Look for people who fill gaps rather than trying to replicate people you already have. And remember, system 2 thinking is tiring so do this when you’re alert and can take the time to think about what you’re doing.

For the interviews, have a diverse group participate. Simply looking at picture of or talking about diverse people before starting interviews increases the chance you hire with diversity in mind. Don’t confuse either confidence or “geek cred” with competence. Keep an open mind about different ways of approaching problems – it’s the result that matters.

Many flowers make a beautiful bouquet – @IsisAnchalee

Let’s Do It!

What is your personal pledge today?

image

Full slide deck is available at http://smallbitesofbigdata.com/archive/2015/10/26/moving-beyond-unconscious-bias-good-people-matter.aspx


Leave a comment

Big Data for the SQL Eye

SQL Server is a great technology – I’ve been using it since 1993 when the user interface consisted of a query window with the options to save and execute and not much else. With every release there’s something new and exciting and there’s always something to learn about even the most familiar of features. However, not everyone uses SQL Server for every storage and compute opportunity – sad but true.

So what is a SQL geek to do in the face of all the new options out there – many under the umbrella of Big Data (distributed processing)? Why just jump right on in and learn it! No one can know all the pieces because it’s a big, fluid, messy collection of “things”. But don’t worry about that, start with one thing and build from there. Even if you never plan to implement a production Big Data system you need to learn about it – because if you don’t have some hands-on experience with it then someone who does have that experience will be influencing the decision makers without you. For a SQL Pro I suggest Hive as that easy entry point. At some point maybe Spark SQL will jump into that gap, but for now Hive is the easiest entry point for most SQL pros.

For more, I refer you to the talk I gave at the Pacific Northwest SQL Server User Group meeting on October 14, 2015. Excerpts are below, the file is attached.

Look, it’s SQL!

SELECT score, fun
FROM toDo
WHERE type = ‘they pay me for this?’;

Here’s how that code looks from Visual Studio along with the links to how you find the output and logs:

image

And yet it’s more!

CREATE EXTERNAL TABLE IF NOT EXISTS toDo
(fun STRING,
rank INT COMMENT ‘rank the greatness’,
type STRING)
COMMENT ‘two tables walk into a bar….’
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
LOCATION ‘/data/demo/’;

image

A mix of old and new

— read some data
SELECT ‘you cannot make me ‘, score, fun, type
FROM toDo
WHERE score <= 0
ORDER BY score;

SELECT ‘when can we ‘, score, fun, type
FROM toDo
WHERE score > 0
DISTRIBUTE BY score SORT BY score;

image

That’s Hive folks!

Hive

on Hadoop
on HDInsight
on AzureBig Data in the cloud!

Hadoop Shines When….
(refer to http://blogs.msdn.com/b/cindygross/archive/2015/02/25/master-choosing-the-right-project-for-hadoop.aspx)

Data exploration, analytics and reporting, new data-driven actionable insights
Rapid iterating
Unknown unknowns
Flexible scaling
Data driven actions for early competitive advantage or first to market
Low number of direct, concurrent users
Low cost data archival

Hadoop Anti-Patterns….

Replace system whose pain points don’t align with Hadoop’s strengths
OLTP needs adequately met by an existing system
Known data with a static schema
Many end users
Interactive response time requirements (becoming less true)
Your first Hadoop project + mission critical system

image

Azure has so much more

Go straight to the business code
Scale storage and compute separately
Open Source
Linux
Managed and unmanaged services
Hybrid
On-demand and 24×7 options
SQL Server

It’s a Polyglot

Stream your data into a lake
Pick the best compute for each task

And it’s Fun!

I hope you enjoyed this small bite of big data!

//

BigDataForTheSQLEye.zip


3 Comments

The Big Data Dragon flies on to Microsoft AzureCAT

“Always in motion is the future” – YodaCindyMar2015

On June 1 I will be moving into a new role on AzureCAT. I tried the small business consulting world with Neal Analytics and it just wasn’t a good fit for me and my passions. So here I go, on to new challenges at Microsoft! I’ll be making the world a better place with the help of Big Data.

And while I’m making changes, I’ll also be moving from Boise, ID to the Redmond, WA area. It’s new adventures all around for me. I’ll miss Boise – my friends, my political battles, the greenbelt and hiking trails, sitting on the patios downtown. And I’m also excited about all the new opportunities I’ll have in my new, blue state.

Bring it on world, I’m ready!

cindygross@outlook.com | @SQLCindy | http://www.linkedin.com/in/cindygross | http://smallbitesofbigdata.com

Cross-published on:

https://befriendingdragons.com/2015/05/07/the-big-data-dragon-flies-on-to-microsoft-azurecat
http://smallbitesofbigdata.com/archive/2015/05/08/the-big-data-dragon-flies-on-to-microsoft-azurecat.aspx


Leave a comment

Hadoop Likes Big Files

One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files “big enough” for Hadoop to process efficiently?

The Problem

One performance best practice for Hadoop is to have fewer large files as opposed to large numbers of small files. A related best practice is to not partition “too much”. Part of the reason for not over-partitioning is that it generally leads to larger numbers of smaller files.

Too small is smaller than HDFS block size (chunk size), or realistically small is something less than several times larger than chunk size. A very, very rough rule of thumb is files should be at least 1GB each and no more than maybe around 10,000-ish files per table. These numbers, especially the maximum total number of files per table, vary depending on many factors. However, it gives you a reference point. The 1GB is based on multiples of the chunk size while the 2nd is honestly a bit of a guess based on a typical small cluster.

Why Is It Important?

One reason for this recommendation is that Hadoop’s name node service keep track of all the files and where the internal chunks of the individual files are. The more files it has to track the more memory it needs on the head node and the longer it takes to build a job execution plan. The number and size of files also affects how memory is used on each node.

smallpiebigpieLet’s say your chunk size is 256MB. That’s the maximum size of each piece of the file that Hadoop will store per node. So if you have 10 nodes and a single 1GB file it would be split into 4 chunks of 256MB each and stored on 4 of those nodes (I’m ignoring the replication factor for this discussion). If you have 1000 files that are 1MB each (still a total data size of ~1GB) then every one of those files is a separate chunk and 1000 chunks are spread across those 10 nodes. NOTE: In Azure and WASB this happens somewhat differently behind the scenes – the data isn’t physically chunked up when initially stored but rather chunked up at the time a job runs.

With the single 1GB file the name node has 5 things to keep track of – the logical file plus the 4 physical chunks and their associated physical locations. With 1000 smaller files the name node has to track the logical file plus 1000 physical chunks and their physical locations. That uses more memory and results in more work when the head node service uses the file location information to build out the plan for how it will split out any Hadoop job into tasks across the many nodes. When we’re talking about systems that often have TBs or PBs of data the difference between small and large files can add up quickly.

The other problem comes at the time that the data is read by a Hadoop job. When the job runs on each node it loads the files the task tracker identified for it to work with into memory on that local node (in WASB the chunking is done at this point). When there are more files to be read for the same amount of data it results in more work and slower execution time for each task within each job. Sometimes you will see hard errors when operating system limits are hit related to the number of open files. There is also more internal work involved in reading the larger number of files and combining the data.

Stitching

There are several options for stitching files together.

  • Combine the files as they land using the code that moves the files. This is the most performant and efficient method in most cases.
  • INSERT into new Hive tables (directories) which creates larger files under the covers. The output file size can be controlled with settings like hive.merge.smallfiles.avgsize and hive.merge.size.per.task.
  • Use a combiner in Pig to load the many small files into bigger splits.
  • Use the HDFS FileSystem Concat API http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#concat.
  • Write custom stitching code and make it a JAR.
  • Enable the Hadoop Archive (HAR). This is not very efficient for this scenario but I am including it for completeness.

There are several writeups out there that address the details of each of these methods so I won’t repeat them.

The key here is to work with fewer, larger files as much as possible in Hadoop. The exact steps to get there will vary depending on your specific scenario.

I hope you enjoyed this small bite of big data!

Cindy Gross – Neal Analytics: Big Data and Cloud Technical Fellow  image
@SQLCindy | @NealAnalytics | CindyG@NealAnalytics.com | http://smallbitesofbigdata.com

//