Language tests are a concern for many students because they are tied to social and economic benefits all over the world. More and more people are required to demonstrate that they can use second and foreign languages effectively in order to advance in their careers.
Because of this, language test preparation has become a big business, with many students spending a lot of time and money trying to figure out how to “beat” language tests.
Yet, many students and teachers do not have a clear idea about how standardized language tests like the TOEFL and the TOEIC are made. This knowledge is important to understand, because people and organizations make significant decisions based on these test scores.
People should know what the scores mean, and what they don’t mean, before making such decisions. Understanding how tests are made can help with this.
So, in this blog, I’m going to introduce the three most important parts of test design: (1) practicality, (2) reliability, and (3) validity.
To begin with, practicality is the easiest to explain. Basically, practicality involves all of the logistical concerns related to creating and giving a test.
How many people are needed to administer the test? Can the test be scored with a computer, or do we need raters? How many rooms will we need? How many pencils? How man copies of the test? How many hours and people will it take to design, print and give the test?
These are probably very familiar to teachers who have ever been involved in giving a big test at their school.
The next part, reliability, is a bit more complicated. Reliability involves analyzing the results of test scores statistically, to determine whether the scores are consistent over a period of time. This involves looking at such things as average scores for tests given at different times or to different group, or measuring the number of students who scored well on some questions and not on others.
If the measures show consistency at different times and among different people taking the test, then the test is reliable; however if, for example, the average scores for one group of people taking the test in June is very different from the scores for people taking the same test in August, then the test is not reliable.
Reliability also includes how a test is administered. So if one group of students taking the TOEFL in February in Japan got a thirty-minute break halfway through the test, but another group of students taking the test at the same time in Korea did not get any break, than this is also an issue with reliability.
Lastly, validity is perhaps the most important consideration for test designers. Basically, validity is an argument justifying why a test score measures what it claims to measure. Have you ever wondered what a test score actually means? One of main criticisms of standardized testing is that the test scores only measure a particular type of knowledge. But what is this type of knowledge?
Historically, most standardized tests in the United States have published versions of their validity arguments that explained in great detail why and how their test measures what they claim to measure. For example, the TOEFL has a 350 page book called “Building a Validity Argument for the Test of English as a Foreign language” which explains why and how the TOEFL measures English language ability for academic performance.
There are many different types of information that can be included in a validity argument. What’s important to understand however is that the purpose of a validity argument is to justify why, for example, a TOEFL score should be taken seriously as a measure of English proficiency. In other words, a validity argument explains both what a test score means, and, more importantly, what it doesn’t mean.
In best practice, a test score should be supported by a validity argument, which explains how the test score should be interpreted and what decisions can reasonably be made based on that score.
As mentioned above, it used to be that validity arguments for standardized tests like the TOEFL and the TOEIC were publically available and easy to access. However, in recent years, it has become more and more difficult to find this information. This is a problem because without this information, a standardized test score is essentially meaningless.