Validity and Reliability - Rough lecture notes

RELIABILITY RELIABITY refers to the consistency and accuracy of data. Field researchers must be reasonably certain that what has been observed in fact actually occured and was interpreted correctly. For example, in one situation, an assistant warden, provoked with a question about the ability of female correctional officers, unleashed a demeaning and sexist tirade against the worthlessness of females in prison. The literal speech content was shocking, sufficiently so that I was momentarily too disconcerted to take notes. However, several other colleagues were present, and our follow-up questions confirmed the accuracy of our perceptions. We later were able to reconstruct the comments. Sometimes, however, reliable observations are more difficult. Gang members often may be identified either by signifying clothing (colors, style of dress) or by hand gestures. But, care must be taken to assure that significations are intended and not inadvertently expressed by non-gang members. In such cases, one can rely on key informants to obtain data, cross-check accounts between sources, compared accounts with any documentary evidence available, and, when possible, questionable events may be experienced first hand. There are a number of ways to assure accuracy of data. First, longevity within a single research setting provides inmates and researcher with a shared history that can be referenced and drawn upon. Short term prison research of less than a year allows little more than what Kirk and Miller (1986: 64-65) have called "copping a look" and "copping a taste." In a short term project, by the time the researcher enters the field, becomes familiarized with the norms and customs, establishes a reliable informant network, learns how to dig beneath surface meanings into the more subtle nuances that guide behavior, and develops strategies for obtaining private information, the project may be over. Extended immersion in a culture helps assure that revisionist interpretations do not creep in, especially when early field notes provide a documentary refresher of any previous events or narratives that may have shifted over time. If, for example, an informant inaccurately describes in 1989 an event that occured in 1980, and to which a researcher may have been privy, that 1989 data can be flagged for subsequent comparison. Diachronic manipulation of data also strengthens validity by alerting the analyst that what is "true" in 1990 may not have been the case in 1980, which mitigates against excessive generalizations across time. Second, several types of crosschecking strategies were readily available. The first, internal verification of observation entails attention to possible contradictions in informants narratives or interpretations that do not correspond to one another, to personal recollections, or to other data sources. Internal verification allows for assessing the validity and accuracy of competing versions of similar accounts, or for assessing the reliability of informants. A second form of cross-checking, external to the setting, requires accessing resources such as prison documents, court records, or other pools of information, by which a given narrative can be verified. The need for external checks arises especially when a prisoner makes a claim about, for example, prison status, illicit activities, or prison conduct. In prison settings, checking accuracy with others must be done with caution, lest it raise the appearance of researer indiscretion, snooping, or generally being "uncool." I have found that more often than not, discrepancies in prisoners' accounts are simply a matter of momentary hyperbole or, more common, simply of memory lapse. This is easily rectified when others with more accurate recall refresh each others' memory to allow reconstruction. Third, other studies of prisons are invaluable in alerting a research to possible anomalies, discrepancies, or inconsistencies. For example, if it appears that the current culture of prison does not support the standard descriptions of an "inmate code," a review of that literature alerts the researcher to possible directions of inquiry to pursue, which bodies of data to re-examine to assure no errors of fact or interpretation occured, and what types of data might have been overlooked that could correct the discrepancy. Data, even if accurate, may not be valid. Kirk and Miller distinguish between three kinds of reliability: 1) QUIXOTIC RELIABILITY: A single method of observation gives an unvarying outcome (eg, the "good guard/dead guard" example or "How are you?" "I'm Fine!"). The trick is TO PROBE DEEPER and look for inconsistances! 2) DIACHRONIC RELIABILITY: Stability of an observation over time. (example: I ask the class "How are you" every Tues, Weds, Thurs, & Fri morning. You say, "GREAT! Never better!" I do this for two months, so I assume you're all doing good. But, I don't ask on Monday. If I did, you'd might say "Crappy--hungover from the weekend." So, in this variant of D-R, I get the same responses every time I ask, and others who ask you might get the same response, so it APPEARS reliable over time, which is is. The problem is that we can's assume that everbody feels great, even though it'sover time, because we left off a crucial time period. Another variant: You're asked every day by researchers how you're doing. We do this aug-sept-oct-nov. You all report that you're doing great. Should we assume "diachronic reliability?" No, not necessarily. Might be, but can't assume it. What if I asked students in December, the before and the week of exams. Would I get the same answer, or would I hear, "Man, life sucks! Five papers, six exams, I haven't studied, and I might flunk!" The point is, the quality of your data must constantly be tested, and you CANNOT ASSUME ANYTHING! Qualitative research is primarily INFERENTIAL, and your inferences must constantly be examined and "tested." 3) SYNCHRONIC RELIABITY: This is the SIMILARITY of your data (observations) within the same time period. K&K use the example of Galileo dropping different sized objects from the Leaning Tower of Pisa. They all hit the ground at the same time. K&K not that an irony is that if S-R can be most useful when you find differences that don't fit the other data or your other inferences. When you talk to people or observe things, you should pay special attention to these types of reliability. This is another feature that sets a science apart from journalism, subjective interpretations, or common sense. It's a way of MAKING SUBJECTIVE DATA OBJECTIVE! VALIDITY VALIDITY refers to "the element of fit between an observation and the basis on which it is made (Kirk and Miller, 1986: 80). Cook and Campbell (1979: 38-39) distinguish between INTERNAL AND EXTERNAL VALIDITY. Internal validity refers to the internal logic of our research and the degree to which accurate statements can be made about our measures of association. In qualitative research, internal validity refers to the logical power of our arguments that link data to concepts. The inferences made between an observation and the categories used to code it must not only be consistently applied, but adhere to some logical set of rules that guide the application. In observing "masked intimacy," for example, the researcher aims to show how a given behavior conceals weakness while displaying kindness. When such an act is observed, its validity can be checked by further inquiry, by seeking similar acts in similar situations, or by eliciting clarification of the acts meaning from the actor. EXTERNAL or CONSTRUCT VALIDITY, by contrast, refers to the degree to which generalizable statements may be drawn from our findings and applied to other populations or used to establish higher-order statements about the conditions in which our observations occur or upon which they are contingent. Some might call this the power of theory or model building. In qualitative research, this refers to the degree to which we can develop "law-like" statements, even though we are not concerned with "causation" or measures of association. Consider, for example, the claim that inmates are judged to be institutionally well-adjusted because they receive few disciplinary tickets. From accurate data, a researcher may then proceed to build a theoretical model of discipline and "prisonization" (Barak-Glantz, 1983). However, it may be that prisoners who receive few tickets are passive, remain in their cells out of fear, and do not take part in "normal" prison activities, whether licit or illicit, that put one at higher risk of discipline. Or, it is possible that those who receive fewer tickets have "learned the ropes" and are not well behaved, but simply "slick." Conversely, those who receive a higher proportion of tickets may be better integrated into the prisoner culture and, ironically, even easier to control, because their disciplinary record reflects healthy (by prisoner standards) adjustment. If any of these alternative meanings hold, then the validity of our theoretical claimes becomes weakened. Therefore, the validity of the observation may be mistaken, because it imputes meanings to disciplinary infractions that may be erroneous and thus subvert the validity of the claims.

<--Return to Jim Thomas's homepage

Page maintained by: Jim Thomas - jthomas@math.niu.edu