Validity and Reliability - Rough lecture notes


                          RELIABILITY

RELIABITY refers to the consistency and accuracy of data.  Field
researchers must be reasonably certain that what has been observed in
fact actually occured and was interpreted correctly.  For example, in
one situation, an assistant warden, provoked with a question about the
ability of female correctional officers, unleashed a demeaning and
sexist tirade against the worthlessness of females in prison. The
literal speech content was shocking, sufficiently so that I was
momentarily too disconcerted to take notes. However, several other
colleagues were present, and our follow-up questions confirmed the
accuracy of our perceptions.  We later were able to reconstruct the
comments.  Sometimes, however, reliable observations are more
difficult.  Gang members often may be identified either by signifying
clothing (colors, style of dress) or by hand gestures. But, care must
be taken to assure that significations are intended and not
inadvertently expressed by non-gang members.  In such cases, one can
rely on key informants to obtain data, cross-check accounts between
sources, compared accounts with any documentary evidence available,
and, when possible, questionable events may be experienced first hand.

There are a number of ways to assure accuracy of data.  First,
longevity within a single research setting provides inmates and
researcher with a shared history that can be referenced and drawn
upon.  Short term prison research of less than a year allows little
more than what Kirk and Miller (1986: 64-65) have called "copping a
look" and "copping a taste." In a short term project, by the time the
researcher enters the field, becomes familiarized with the norms and
customs, establishes a reliable informant network, learns how to dig
beneath surface meanings into the more subtle nuances that guide
behavior, and develops strategies for obtaining private information,
the project may be over.  Extended immersion in a culture helps assure
that revisionist interpretations do not creep in, especially when
early field notes provide a documentary refresher of any previous
events or narratives that may have shifted over time.  If, for
example, an informant inaccurately describes in 1989 an event that
occured in 1980, and to which a researcher may have been privy, that
1989 data can be flagged for subsequent comparison.  Diachronic
manipulation of data also strengthens validity by alerting the analyst
that what is "true" in 1990 may not have been the case in 1980, which
mitigates against excessive generalizations across time.

Second, several types of crosschecking strategies were readily
available.  The first, internal verification of observation entails
attention to possible contradictions in informants narratives or
interpretations that do not correspond to one another, to personal
recollections, or to other data sources.  Internal verification allows
for assessing the validity and accuracy of competing versions of
similar accounts, or for assessing the reliability of informants.  A
second form of cross-checking, external to the setting, requires
accessing resources such as prison documents, court records, or other
pools of information, by which a given narrative can be verified.  The
need for external checks arises especially when a prisoner makes a
claim about, for example, prison status, illicit activities, or prison
conduct.  In prison settings, checking accuracy with others must be
done with caution, lest it raise the appearance of researer
indiscretion, snooping, or generally being "uncool." I have found that
more often than not, discrepancies in prisoners' accounts are simply a
matter of momentary hyperbole or, more common, simply of memory lapse.
This is easily rectified when others with more accurate recall refresh
each others' memory to allow reconstruction.

Third, other studies of prisons are invaluable in alerting a research
to possible anomalies, discrepancies, or inconsistencies.  For
example, if it appears that the current culture of prison does not
support the standard descriptions of an "inmate code," a review of
that literature alerts the researcher to possible directions of
inquiry to pursue, which bodies of data to re-examine to assure no
errors of fact or interpretation occured, and what types of data might
have been overlooked that could correct the discrepancy.  Data, even
if accurate, may not be valid.

Kirk and Miller distinguish between three kinds of reliability:

   1) QUIXOTIC RELIABILITY: A single method of observation 
gives an unvarying outcome (eg, the "good guard/dead guard" example
or "How are you?" "I'm Fine!"). The trick is TO PROBE DEEPER and look
for inconsistances!

   2) DIACHRONIC RELIABILITY: Stability of an observation over time.
(example: I ask the class "How are you" every Tues, Weds, Thurs, & Fri 
morning.  You say, "GREAT! Never better!" I do this for two months, so
I assume you're all doing good. But, I don't ask on Monday. If I
did, you'd might say "Crappy--hungover from the weekend." So, in
this variant of D-R, I get the same responses every time I ask, 
and others who ask you might get the same response, so it APPEARS
reliable over time, which is is. The problem is that we can's
assume that everbody feels great, even though it'sover time,
because we left off a crucial time period.

Another variant: You're asked every day by researchers how you're
doing. We do this aug-sept-oct-nov. You all report that you're
doing great. Should we assume "diachronic reliability?" No,
not necessarily. Might be, but can't assume it. What if I asked
students in December, the before and the week of exams. Would
I get the same answer, or would I hear, "Man, life sucks! Five
papers, six exams, I haven't studied, and I might flunk!"

The point is, the quality of your data must constantly be tested,
and you CANNOT ASSUME ANYTHING! Qualitative research is primarily
INFERENTIAL, and your inferences must constantly be examined and
"tested."

   3) SYNCHRONIC RELIABITY: This is the SIMILARITY of your
data (observations) within the same time period. K&K use the
example of Galileo dropping different sized objects from the
Leaning Tower of Pisa. They all hit the ground at the same time.
K&K not that an irony is that if S-R can be most useful when
you find differences that don't fit the other data or your
other inferences.

When you talk to people or observe things, you should pay special
attention to these types of reliability. This is another feature
that sets a science apart from journalism, subjective interpretations,
or common sense. It's a way of MAKING SUBJECTIVE DATA OBJECTIVE!

                               VALIDITY

VALIDITY refers to "the element of fit between an observation and the
basis on which it is made (Kirk and Miller, 1986: 80).  Cook and
Campbell (1979: 38-39) distinguish between INTERNAL AND EXTERNAL
VALIDITY.

Internal validity refers to the internal logic of our research and the
degree to which accurate statements can be made about our measures of
association.  In qualitative research, internal validity refers to the
logical power of our arguments that link data to concepts.  The
inferences made between an observation and the categories used to code
it must not only be consistently applied, but adhere to some logical
set of rules that guide the application.  In observing "masked
intimacy," for example, the researcher aims to show how a given
behavior conceals weakness while displaying kindness.  When such an
act is observed, its validity can be checked by further inquiry, by
seeking similar acts in similar situations, or by eliciting
clarification of the acts meaning from the actor.

EXTERNAL or CONSTRUCT VALIDITY, by contrast, refers to the degree to
which generalizable statements may be drawn from our findings and
applied to other populations or used to establish higher-order
statements about the conditions in which our observations occur or
upon which they are contingent. Some might call this the power of
theory or model building.  In qualitative research, this refers to the
degree to which we can develop "law-like" statements, even though we
are not concerned with "causation" or measures of association.
Consider, for example, the claim that inmates are judged to be
institutionally well-adjusted because they receive few disciplinary
tickets. From accurate data, a researcher may then proceed to build a
theoretical model of discipline and "prisonization" (Barak-Glantz,
1983).

However, it may be that prisoners who receive few tickets are passive,
remain in their cells out of fear, and do not take part in "normal"
prison activities, whether licit or illicit, that put one at higher
risk of discipline. Or, it is possible that those who receive fewer
tickets have "learned the ropes" and are not well behaved, but simply
"slick." Conversely, those who receive a higher proportion of tickets
may be better integrated into the prisoner culture and, ironically,
even easier to control, because their disciplinary record reflects
healthy (by prisoner standards) adjustment. If any of these
alternative meanings hold, then the validity of our theoretical
claimes becomes weakened.  Therefore, the validity of the observation
may be mistaken, because it imputes meanings to disciplinary
infractions that may be erroneous and thus subvert the validity of the
claims.
Return to Jim Thomas's homepage
Page maintained by: Jim Thomas - jthomas@math.niu.edu