Codebook Development for Team-Based Qualitative Analysis
MacQueen, Kathleen M., McLellan, Eleanor, Kay, Kelly, & Milstein Bobby (1998)
Cultural Anthropology Methods, 10(2): 31-36
One of the key elements in qualitative data analysis is the systematic coding of text (Strauss and Corbin, 1990; Miles and Huberman 1994:56). Codes are the building blocks for theory or model building and the foundation on which the analyst’s arguments rest. Implicitly or explicitly, they embody the assumptions underlying the analysis. Given the context of the interdisciplinary nature of research at the Centers for Disease Control and Prevention (CDC), we have sought to develop explicit guidelines for all aspects of qualitative data analysis, including codebook development.
On the one hand, we must often explain basic methods such as this in clear terms to a wide range of scientists who have little or no experience with qualitative research and who may express a deep skepticism of the validity of our results. On the other, our codebook development strategy must be responsive to the teamwork approach that typifies the projects we undertake at CDC, where coding is generally done by two or more persons who may be located at widely dispersed sites. We generally use multiple coders so that we can assess the reliability and validity of the coded data through intercoder agreement measures (e.g., Carey et al. 1996) and, for some projects, as the only reasonable way to handle the sheer volume of data generated. The standardized structure and dynamic process used in our codebook development strategy reflects these concerns.
This paper describes (1) how a structured codebook provides a stable frame for the dynamic analysis of textual data; (2) how specific codebook features can improve intercoder agreement among multiple researchers; and (3) the value of team-based codebook development and coding.
Origins of the Codebook Format
Our codebook format evolved over the course of several years and a variety of projects. The conceptual origins took shape in 1993 during work on the CDC-funded Prevention of HIV in Women and Infants Project (WIDP) (Cotton et al. 1998), which generated approximately 600 transcribed semistructured interviews. One research question pursued was whether women’s narratives about their own heterosexual behavior could help us understand general processes of change in condom use behavior (Milstein et al. 1998). The researchers decided to use the processes of change (POC) constructs from the Transtheoretical Model (Prochaska 1984; DiClemente and Prochaska 1985) as a framework for the text analysis. However, the validity of the POC constructs for condom-use behavior was unknown, and a credible and rigorous text coding strategy was needed to establish their applicability and relevance for this context. To do this, the analysts had to synthesize all that was known about each POC construct, define what it was, what it was not, and, most importantly, learn how to recognize one in natural language. Several years earlier, O’Connell (1989) had confronted a similar problem while examining POCs in transcripts of psychotherapy sessions. Recognizing that "coding processes of change often requires that the coder infer from the statement and its context what the intention of the speaker was," O’Connell (1989:106) developed a coding manual that included a section for each code titled "Differentiating (blank) from Other Processes." Milstein and colleagues used O’Connell’s "differentiation" section in a modified format in their analysis of condom behavior change narratives. They conceptualized the "differentiation" component as "exclusion criteria," which complemented the standard code definitions (which then became known as "inclusion criteria").
To facilitate on-line coding with the software program Tally (Bowyer 1991; Trotter 1993), components were added for the code mnemonic and a brief definition, as well as illustrative examples. Thus, the final version of the analysis codebook contained five parts: the code mnemonic, a brief definition, a full definition of inclusion criteria, a full definition of exclusion criteria to explain how the code differed from others, and example passages that illustrated how the code concept might appear in natural language. During the code application phase, information in each of these sections was supplemented and clarified (often with citations and detailed descriptions of earlier work), but the basic structure of the codebook guidelines remained stable.