I've never heard of "translation" validity before, but I needed a good name to summarize what both face and content validity are getting at, and that one seemed sensible. All of the other labels are commonly known, but the way I've organized them is different than I've seen elsewhere. Let's see if we can make some sense out of this list. First, as mentioned above, I would like to use the term construct validity to be the overarching category.
Construct validity is the approximate truth of the conclusion that your operationalization accurately reflects its construct. All of the other terms address this general issue in different ways. Second, I make a distinction between two broad types: In translation validity , you focus on whether the operationalization is a good reflection of the construct. This approach is definitional in nature -- it assumes you have a good detailed definition of the construct and that you can check the operationalization against it.
In criterion-related validity , you examine whether the operationalization behaves the way it should given your theory of the construct.
This is a more relational approach to construct validity. If all this seems a bit dense, hang in there until you've gone through the discussion below -- then come back and re-read this paragraph.
Let's go through the specific validity types. I just made this one up today! See how easy it is to be a methodologist? I needed a term that described what both face and content validity are getting at. In essence, both of those validity types are attempting to assess the degree to which you accurately translated your construct into the operationalization, and hence the choice of name. Let's look at the two types of translation validity. In face validity , you look at the operationalization and see whether "on its face" it seems like a good translation of the construct.
This is probably the weakest way to try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is a good measure of math ability i. Or, you might observe a teenage pregnancy prevention program and conclude that, "Yep, this is indeed a teenage pregnancy prevention program. Note that just because it is weak evidence doesn't mean that it is wrong. We need to rely on our subjective judgment throughout the research process.
It's just that this form of judgment won't be very convincing to others. We can improve the quality of face validity assessment considerably by making it more systematic. For instance, if you are trying to assess the face validity of a math ability measure, it would be more convincing if you sent the test to a carefully selected sample of experts on math ability testing and they all reported back with the judgment that your measure appears to be a good measure of math ability.
In content validity , you essentially check the operationalization against the relevant content domain for the construct. This approach assumes that you have a good detailed description of the content domain, something that's not always true. For instance, we might lay out all of the criteria that should be met in a program that claims to be a "teenage pregnancy prevention program. Then, armed with these criteria, we could use them as a type of checklist when examining our program.
Only programs that meet the criteria can legitimately be defined as "teenage pregnancy prevention programs. They also include relationships between the test and measures of other constructs. As currently understood, construct validity is not distinct from the support for the substantive theory of the construct that the test is designed to measure. As such, experiments designed to reveal aspects of the causal role of the construct also contribute to construct validity evidence.
For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature? Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct. For example, a test of the ability to add two numbers should include a range of combinations of digits.
A test with only one-digit numbers, or only even numbers, would not have good coverage of the content domain. Content related evidence typically involves a subject matter expert SME evaluating test items against the test specifications.
Before going to final administration of questionnaires, the researcher should consult the validity of items against each of the constructs or variables and accordingly modify measurement instruments on the basis of SME's opinion. Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. The experts will be able to review the items and comment on whether the items cover a representative sample of the behaviour domain.
Face validity is an estimate of whether a test appears to measure a certain criterion; it does not guarantee that the test actually measures phenomena in that domain. Measures may have high validity, but when the test does not appear to be measuring what it is, it has low face validity. Indeed, when a test is subject to faking malingering , low face validity might make the test more valid. Considering one may get more honest answers with lower face validity, it is sometimes important to make it appear as though there is low face validity whilst administering the measures.
Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion e. To answer this you have to know, what different kinds of arithmetic skills mathematical skills include face validity relates to whether a test appears to be a good measure or not.
This judgment is made on the "face" of the test, thus it can also be judged by the amateur. Face validity is a starting point, but should never be assumed to be probably valid for any given purpose, as the "experts" have been wrong before—the Malleus Malificarum Hammer of Witches had no support for its conclusions other than the self-imagined competence of two "experts" in "witchcraft detection," yet it was used as a "test" to condemn and burn at the stake tens of thousands men and women as "witches.
Criterion validity evidence involves the correlation between the test and a criterion variable or variables taken as representative of the construct.
In other words, it compares the test with other measures or outcomes the criteria already held to be valid. For example, employee selection tests are often validated against measures of job performance the criterion , and IQ tests are often validated against measures of academic performance the criterion.
If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data are collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.
Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. When the measure is compared to another measure of the same type, they will be related or correlated. Returning to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews. Predictive validity refers to the degree to which the operationalization can predict or correlate with other measures of the same construct that are measured at some time in the future.
Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated. This is also when measurement predicts a relationship between what is measured and something else; predicting whether or not the other thing will happen in the future. This type of validity is important from a public view standpoint; is this going to look acceptable to the public or not?
The validity of the design of experimental research studies is a fundamental part of the scientific method , and a concern of research ethics. Without a valid design, valid scientific conclusions cannot be drawn.
Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures. Internal validity is an inductive estimate of the degree to which conclusions about causal relationships can be made e.
Good experimental techniques, in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, usually allow for higher degrees of internal validity than, for example, single-case designs. Eight kinds of confounding variable can interfere with internal validity i. External validity concerns the extent to which the internally valid results of a study can be held to be true for other cases, for example to different people, places or times.
In other words, it is about whether findings can be validly generalized. If the same research study was conducted in those other cases, would it get the same results? A major factor in this is whether the study sample e. Other factors jeopardizing external validity are:. Ecological validity is the extent to which research results can be applied to real-life situations outside of research settings.
To be ecologically valid, the methods, materials and setting of a study must approximate the real-life situation that is under investigation. Ecological validity is partly related to the issue of experiment versus observation. Typically in science, there are two domains of research: The purpose of experimental designs is to test causality, so that you can infer A causes B or B causes A.
Then you can still do research, but it is not causal, it is correlational. You can only conclude that A occurs together with B. Both techniques have their strengths and weaknesses. On first glance, internal and external validity seem to contradict each other — to get an experimental design you have to control for all interfering variables.
That is why you often conduct your experiment in a laboratory setting. While gaining internal validity excluding interfering variables by keeping them constant you lose ecological or external validity because you establish an artificial laboratory setting. On the other hand, with observational research you can not control for interfering variables low internal validity but you can measure in the natural ecological environment, at the place where behavior normally occurs.
However, in doing so, you sacrifice internal validity. The apparent contradiction of internal validity and external validity is, however, only superficial. The question of whether results from a particular study generalize to other people, places or times arises only when one follows an inductivist research strategy. If the goal of a study is to deductively test a theory, one is only concerned with factors which might undermine the rigor of the study, i.
In psychiatry there is a particular issue with assessing the validity of the diagnostic categories themselves. Robins and Guze proposed in what were to become influential formal criteria for establishing the validity of psychiatric diagnoses.
They listed five criteria: Kendler in distinguished between:
External validity is about generalization: To what extent can an effect in research, be generalized to populations, settings, treatment variables, and measurement variables? External validity is usually split into two distinct types, population validity and ecological validity and they are both essential elements in judging the strength of an experimental .
Construct validity is the approximate truth of the conclusion that your operationalization accurately reflects its construct. All of the other terms address this general issue in different ways. Second, I make a distinction between two broad types: translation validity and criterion-related validity.
Face Validity is the most basic type of validity and it is associated with a highest level of subjectivity because it is not based on any scientific approach. In other words, in this case a test may be specified as valid by a researcher because it may seem as valid, without an in-depth scientific justification. Notice that a tool can have high content validity and low construct validity. Our survey might ask questions that are all relevant to empathy and therefore have a high content validity. But, if it is measuring something other than empathy (like guilt-motivated behavior), its construct validity is low.
Types of validity. Explanations > Social Research > Design > Types of validity. Construct | Content | Internal | Conclusion | External | Criterion | Face | Threats | See also. In a . INTERNAL VALIDITY is affected by flaws within the study itself such as not controlling some of the major variables (a design problem), or problems with the research instrument (a data collection problem).