Manually Clustering 275 Images For Qualitative Analysis


Grounded Theory is a technique for developing a theory about some empirical data you collected. This is in opposition to having some hypotheses in advance, which you will either verify or not. However, if you’re collecting data in an entirely new field, you might not know what hypotheses are important, and the hypotheses you have in advance might be irrelevant.

Grounded Theory is essentially about developing a theory grounded in the data. Open coding is one of the processes that appears in grounded theory. However, any description of the actual process of doing grounded theory is extremely confusing, abstract, and written in terms that are so carefully defined they lose their meaning.

Ironically, for a process that is supposed to help us grounded theory in the actual, real, practical data, there are not many examples out there of the process of doing any of the steps in grounded theory. Final results are published all the time, and one can sometimes find pictures on the internet or a wall in a research lab covered with well-ordered groups of post-its, but they don’t accurately express the tentative messiness of the early stages of grounded theory. Sensei Koji Yatani showed me how to do open coding with some interview data I collected for an (unpublished) study. I told my Empirical Methods teacher Steve Easterbrook that I felt my open coding technique was “messy, ad-hoc, and probably pretty informal”. He laughed and told me that everyone thinks that way.

Given all that, here’s a time-lapse video of me doing some open coding on ~275 images taken from a study I just ran. I’m not interested in positing a formal “theory” at this stage, just trying to get a sense of the behaviour the data covers, so this process is technically called Thematic Analysis.
This is a study I sat through, so I had some sense of the data in advance of the analysis, as opposed to going in blind. The video is sped up 60 times from real-time. If there are any other videos of someone doing grounded theory out there, I would love to see them.

Thanks to Katherine Sellen and Carrie Demmans Epp for wording clarifications.

Narrative of my thought process:

Starting from the beginning, as I was looking at the images (with a rough idea of what they’d contain), they seemed to sit on a natural one-dimensional spectrum. I discovered three reasonable levels of discrete groups around 0:05.

0:08 Some anomalous images did not seem to fit well into this spectrum, so I left them off to the upper right to deal with later.

As it was mechanically easier (i.e. I didn’t have to reach as far) when I was taking images from the “source” pile (top) I grabbed a bunch and pre-sorted them into the 3 discretization levels before moving them onto the bigger piles. This also conveniently ensured that my discretization made sense for each sampling of the source images. You can see the pre-sorted piles clearly at 0:10, but this happened several times.

0:18 I started sub-clustering the “anomalies” pile. I did this first because it was smallest, and because it had the least in common with the rest of the data. I didn’t want to try to sort any of the data on the big spectrum first, as any preconceived prejudices would be stronger before I spent time really digging into the data.

0:24 I had developed clusters I was happy with and started labeling them with post-its. Putting the notions I discovered in these ad-hoc piles into words turned out to be more difficult than I expected.

0:27 Commenced clustering of the big pile in the low end of the spectrum. This went pretty smoothly.

0:41 Start clustering of the middle pile along the spectrum. I didn’t realize this until later, but the low end of the spectrum was pretty different from the rest of the spectrum. So, some of my pre-conceptions made this middle pile clustering difficult. I ended up putting it aside around 0:48 to work on the high end of the spectrum.

0:59 Clustering of the high end of the spectrum went well. I did find nice, discrete groups, but for a while they were spread out over a larger 2D space.

1:10 Finished labeling of the clusters from the high end of the spectrum. Then, because it would look cool and divide the higher-level groups nicely, I used blue painter’s tape to divide the groups.

1:35 Finished the clustering of the medium-level groups. Much easier this time, and similar to the higher-level group. Then I went around checking the validity of each group.

1:41 Took a picture of each group, from high to low-level, for documentation.

,