What to do when you get back in to the building - making sense of qualitative data
“Never theorise before you have data. Invariably you end up twisting facts to suit theories instead of theories to suit facts.” - Sherlock Holmes
I’ve amassed a lot of data from Kanbanery users over the past six months or so. I have survey data, mostly with open-ended questions, support emails, requests and questions sent directly to me, and interview transcripts.
The books I’ve read that target product managers, startup leaders, and “growth hackers” have all been disappointing in terms of what to do with this data, and focus more on collecting the data. Getting out of the building is great, but what to do with all those notes and transcripts?
Most lean startup books advise something like “look for patterns” which seems sensible, but how, with a hundred pages or more of text? Without a plan, it seems like an impossible task at best or like a recipe for confirmation bias at worst. Data from dozens of people with different points of view and different contexts means that some of them will say things you like, and some things will be hard to hear. It’s like looking for support for a political position in the Bible or the Koran; there’s so much in there that anyone can pick and choose some verse that agrees with them.
Luckily, I studied anthropology at university (my undergraduate is a B.Sc. in anthropology from the University of North Texas and my MBA from the University of Illinois is a self-designed program specializing in “corporate anthropology” done in coordination with the anthropology department) so I’ve been exposed to tools for evaluating qualitative data that are largely unknown in lean startup circles.
At the highest level, when faced with a pile of qualitative data, social science researchers do one of two things. They apply a relevant theory to the study of the data and the theory drives the method of interpreting the data, or if they have no theory and are just trying to understand the apparently incomprehensible, they use a technique called grounded theory. Grounded theory is a method of applying an inductive approach to analyzing data to arrive at a theory that explains it. It stems for work done by Glaser and Straus in the 60’s, back when anthropologists were still pretending to be scientists before suffering the massive crisis of confidence known as post-modernism that left us all wallowing in the agony of imposter syndrome. Anyway, I digress.
Because it was my first stab at the data and I didn’t know what to expect, I approached it using grounded theory. I’ll describe the process with a little bit about the ideas underlying the method (without getting too academic).
The goal of applying grounded theory to qualitative data analysis is not to arrive at the correct conclusion, but to extract useful insights from the data. For hardcore big data geeks, that may seem a bit wishy-washy, but in the end, data needs context, and people make for messy context. They are not perfectly rational, and to understand why they do what they do means getting a bit messy with them. One nice thing about this is that you can dip your toe into the same data twice, and get different results. That doesn’t invalidate your first findings; it adds to them. In fact, revisiting a growing data set is crucial to arriving at deep understanding.
So, to the meat of it. Here’s how I approached the interview and survey data using grounded theory.
Step one: Read Everything It can be tedious, but a read through of all the data, ideally in one sitting, provides a context for the next step, coding.
Step two: Open Coding In this step, I read through the documents, adding notes as I went. I tried to keep my “notes” to just one or two words. In the first pass, I came up with codes like “wishes”, “complaints”, “context”, and “compliments”. Consider as you’re coding whether your codes can be refined. For example, ask I what else is relevant. For the code “complaints” it might be relevant to note something of the context (New users or experienced users? What part of the application are they complaining about?). I kept a code list on a separate page in which I explained each code as I used it. For example “compliments” is the code I used whenever a respondent was talking about other tools that they used in addition to Kanbanery. I might not remember that when I come back to the data in a month, so it’s described in the code list. There is no limit to how many codes you can generate. For one project, I generated only about twenty codes from a hundred pages of data. For another, I generated almost 60 codes from just twelve pages of data.
Step two-and-a-half (optional): Line By Line Coding If you felt stuck in step one, there are ways to unstick yourself. Perhaps you kept seeing the same topic and only that topic, and your notes are full of just one or two codes with large portions of the text uncoded. Line by line coding is a way to break free of your mental barriers to see the data in a different way. Because sentences don’t tend to all fit in only one line, a printed line will have parts of two or even three sentences in it. Line by line coding is the process of coming up with one code that somehow describes what you see in a line of text, separated from the context of the paragraph and connected sentences. It’s a bit painful to do, but it always breaks me out of limiting thinking habits when I’m having an uncreative coding session.
Step three: Clustering Now review your codes. Can they be grouped into themes? Perhaps two codes are basically the same thing and can be combined. If you’re going through this process with new data (it is an iterative process) then perhaps your perspective of the groups has changed with the new data. There’s a great video on YouTube illustrating how new data can alter our perspective on the existing data:
Step four: Axial Coding Now is the step in which you start making sense of the data. Axial coding is about looking for patterns in the codes themselves. Is the a relationship between sets of codes? For example, are one set of codes the result of another? Or are two sets in direct opposition to each other? Or maybe both. Perhaps one set of data is the cause of two very different reactions.
Step five: Selective Coding Finally, review the code sets that you’ve created and look at the relationships between them. Which code sets seem most relevant. This is the selective component of selective coding. You’re trying to extract a story from the codes. For example, there’s a clear relationship between the “complaints” data set for Kanbanery and the “learning” set. The learning curve of Kanbanery isn’t as comfortable as it could be. Most complaints from new users are not about missing features, but about finding them and learning to use them. That tells a story about the new user experience that is needless to say, very interesting to me as a product manager and it drove my decision to implement a week-long email onboarding campaign.
Repeat as needed. As you’re working through these steps, you may find opportunities to go back and do a bit more open coding, or while doing selective coding you may discover a relationship that you missed the first time to were doing axial coding. It’s fine to hop around here. When you have a compelling story, which is supported by the data, and feels like it might compel you to some action inspired by a new understanding of your users’ feelings or experiences, you can pat yourself on the back. You’re done for now.
This is hardly the only way to make sense of qualitative data, but I appreciate how methodical it is. When I have no better idea, and I have to make sense of a lot of data, this is my go to strategy.
If I’ve been too vague in some area, or just dead wrong about something, please let me know in the comments. If you try this approach, I’d love to hear how it worked for you.