
Unit 3: Construct validity
Daniel Muñoz Acevedo 25 May 202025/05/20 a las 19:17 hrs.2020-05-25 19:17:25
So far, we have seen that it is important that an assessment process, a test or any measurement instrument is valid, that is to say, that it measures/ evaluates what is intended to measure. Now the question is: How do we know that an instrument or test is measuring what is intended to measure?
Short answer: in several related but different ways to provide test validity.
The sources of validity in a test are normally referred to as types of validity. All of them are sources of information that allow us to judge whether a test is testing what is supposed to be testing. However, one of the main sources of validity is not a type of validity but a different characteristic altogether. We will refer to it by the end of the post, as a way of cliff-hanger.
So, to validity now. To know that a test is measuring what is supposed to measure, we first need to understand what is being measured. In the first post in this blog, we observed the example of weather. So, to know that particular procedure and/or instrument is actually measuring the weather, we need to know first what the weather is. Sounds simple, but it is not.
In the case of weather, a simple revision of the concept yields a very, very complex set of characteristics and properties. Weather seems to be a compound fo different phenomena like moisture, temperature, wind, etc. So any instrument/ procedure we use to evaluate/ measure weather, should be evaluating/ measuring those phenomena. To make things even more interesting, when we start looking at the measurement of things like wind, humidity or air pressure, we will find that those are also complex phenomena constituted by many characteristics and properties. And so on and so forth.
The problem of measuring/ evaluating phenomena is, therefore, to first have an idea of what is that that we want to measure. Sometimes that idea can be defined simply: in daily life, the weather can be defined as temperature and/ or probability of raining. For other purposes, of course, the definition a lot more complex: for meteorologists, the weather can be a compound of multiple interrelated measurements. Remember the example of the mercury thermometer we discussed in our last chat? Well, same thing there. We need to have some comprehension of the phenomenon of temperature in order to be able to device an instrument that allows us to measure it.
The thermometer is a very good example of how our understanding of the phenomenon to be measured is essential to develop a good measurement instrument for that phenomenon. For example, Galileo's thermometer, basically a tube containing water, was affected by temperature and air pressure, which are phenomena different from temperature. Therefore, we need to know what temperature is in order to see whether the instrument we are using is actually measuring the phenomena or is also affected by different phenomena.
So, the definition of the phenomenon to be evaluated is at the core of an assessment process. In the field of assessment, that definition is called a construct or theoretical construct. For a meteorologist, the constructs behind her instruments are ideas such as temperature, wind speed, humidity, weather, etc. The construct to be measured by a thermometer is that of temperature.
It is important to notice in this explanation that instruments and tests measure constructs rather than actual phenomena. This is because phenomena can be defined by the use of different theoretical constructs. This is more obvious when trying to measure non-physical phenomena such as motivation, aptitude or (guess what) language ability or performance. In cases such as those, the instruments that can be used to test their existence are completely dependent on how we define the phenomenon.
In the case of language ability, knowledge or performance, tests can look like a series of written drills to complete with correct grammatical forms or they can look like a conversation about real life issues with a classmate. Broadly said, the decision in this case depends mostly on whether we are defining the construct of language proficiency from a structuralist or a functionalist perspective, respectively. In the first case, we conceive of language as a set of rules of formation, normally grammatical or phonological. In the latter, we consider language as a tool to communicate. So, different constructs, different tests.
The main source of validity of a language test, therefore, lies in the capacity of that test to reflect the theoretical construct of language we are using as a way to observe and understand the phenomenon of language that the theoretical construct tries to explain and characterise. Non-surprisingly, this source if validity is generally named construct validity.
In order to check whether a test is valid, therefore, we need to first know what the construct that it intends to evaluate/ test is and then judge whether the instrument or procedure is actually capable of allowing the observation and evaluation of that construct.
We will have plenty of conversations about validity, as it has become the main perspective in the field of language assessment. For the moment, we can at least say that a test is not only valid when it evaluates with precision what it is intended to evaluate. That is only the beginning of problems. This is so because a test also is valid when it is used in the appropriate decision-making process, if the stakeholders that use the test "believe" that the test measures what it is intended to measure and also if the consequences of the decisions made based on a test are also what they were intended. All of these are sources of validity different from construct validity.
Finally, a main source of validity is one characteristic of every test or assessment process: its reliability. We will learn in this seminar that it is very, very difficult to affirm that a test is valid if we do not show that its results are reliable or consistent. The relationship between validity and reliability, we will see, lies behind all the problems we normally find in the design and application of English L2 tests.
Related sources
As all theoretical constructs, weather can be presented in different ways. Compare these two explanations, one for educated adult people like us, and the other made for educated kids (also like us, in a sense).
Weather according to Wikipedia:
www.nationalgeographic.org/ ... ncyclopedia/weather/
Weather according to National Geographic for kids:
en.wikipedia.org/wiki/Weather
You can also check this video of the history of the thermometer: Fahrenheit to Celsius: History of the thermometer. Pay attention to all the problems of early attempts that are directly related to construct validity (in this case, that got in the way of measuring temperature and only temperature).
Short answer: in several related but different ways to provide test validity.
The sources of validity in a test are normally referred to as types of validity. All of them are sources of information that allow us to judge whether a test is testing what is supposed to be testing. However, one of the main sources of validity is not a type of validity but a different characteristic altogether. We will refer to it by the end of the post, as a way of cliff-hanger.
So, to validity now. To know that a test is measuring what is supposed to measure, we first need to understand what is being measured. In the first post in this blog, we observed the example of weather. So, to know that particular procedure and/or instrument is actually measuring the weather, we need to know first what the weather is. Sounds simple, but it is not.
In the case of weather, a simple revision of the concept yields a very, very complex set of characteristics and properties. Weather seems to be a compound fo different phenomena like moisture, temperature, wind, etc. So any instrument/ procedure we use to evaluate/ measure weather, should be evaluating/ measuring those phenomena. To make things even more interesting, when we start looking at the measurement of things like wind, humidity or air pressure, we will find that those are also complex phenomena constituted by many characteristics and properties. And so on and so forth.
The problem of measuring/ evaluating phenomena is, therefore, to first have an idea of what is that that we want to measure. Sometimes that idea can be defined simply: in daily life, the weather can be defined as temperature and/ or probability of raining. For other purposes, of course, the definition a lot more complex: for meteorologists, the weather can be a compound of multiple interrelated measurements. Remember the example of the mercury thermometer we discussed in our last chat? Well, same thing there. We need to have some comprehension of the phenomenon of temperature in order to be able to device an instrument that allows us to measure it.
The thermometer is a very good example of how our understanding of the phenomenon to be measured is essential to develop a good measurement instrument for that phenomenon. For example, Galileo's thermometer, basically a tube containing water, was affected by temperature and air pressure, which are phenomena different from temperature. Therefore, we need to know what temperature is in order to see whether the instrument we are using is actually measuring the phenomena or is also affected by different phenomena.
So, the definition of the phenomenon to be evaluated is at the core of an assessment process. In the field of assessment, that definition is called a construct or theoretical construct. For a meteorologist, the constructs behind her instruments are ideas such as temperature, wind speed, humidity, weather, etc. The construct to be measured by a thermometer is that of temperature.
It is important to notice in this explanation that instruments and tests measure constructs rather than actual phenomena. This is because phenomena can be defined by the use of different theoretical constructs. This is more obvious when trying to measure non-physical phenomena such as motivation, aptitude or (guess what) language ability or performance. In cases such as those, the instruments that can be used to test their existence are completely dependent on how we define the phenomenon.
In the case of language ability, knowledge or performance, tests can look like a series of written drills to complete with correct grammatical forms or they can look like a conversation about real life issues with a classmate. Broadly said, the decision in this case depends mostly on whether we are defining the construct of language proficiency from a structuralist or a functionalist perspective, respectively. In the first case, we conceive of language as a set of rules of formation, normally grammatical or phonological. In the latter, we consider language as a tool to communicate. So, different constructs, different tests.
The main source of validity of a language test, therefore, lies in the capacity of that test to reflect the theoretical construct of language we are using as a way to observe and understand the phenomenon of language that the theoretical construct tries to explain and characterise. Non-surprisingly, this source if validity is generally named construct validity.
In order to check whether a test is valid, therefore, we need to first know what the construct that it intends to evaluate/ test is and then judge whether the instrument or procedure is actually capable of allowing the observation and evaluation of that construct.
We will have plenty of conversations about validity, as it has become the main perspective in the field of language assessment. For the moment, we can at least say that a test is not only valid when it evaluates with precision what it is intended to evaluate. That is only the beginning of problems. This is so because a test also is valid when it is used in the appropriate decision-making process, if the stakeholders that use the test "believe" that the test measures what it is intended to measure and also if the consequences of the decisions made based on a test are also what they were intended. All of these are sources of validity different from construct validity.
Finally, a main source of validity is one characteristic of every test or assessment process: its reliability. We will learn in this seminar that it is very, very difficult to affirm that a test is valid if we do not show that its results are reliable or consistent. The relationship between validity and reliability, we will see, lies behind all the problems we normally find in the design and application of English L2 tests.
Related sources
As all theoretical constructs, weather can be presented in different ways. Compare these two explanations, one for educated adult people like us, and the other made for educated kids (also like us, in a sense).
Weather according to Wikipedia:
www.nationalgeographic.org/ ... ncyclopedia/weather/
Weather according to National Geographic for kids:
en.wikipedia.org/wiki/Weather
You can also check this video of the history of the thermometer: Fahrenheit to Celsius: History of the thermometer. Pay attention to all the problems of early attempts that are directly related to construct validity (in this case, that got in the way of measuring temperature and only temperature).
Compartir | |
---|---|
Última Modificación | 25 May 202025/05/20 a las 19:17 hrs.2020-05-25 19:17:25 |
Vistas Únicas | 0 |
Comentarios |
|