Keaton Peters reports via the Texas Tribune: This week, students taking the STAAR exam will be part of a new way to evaluate Texas schools. Written answers to state standardized tests are automatically scored by computer. The Texas Education Agency is deploying an “automated scoring engine” for open-ended questions on the Texas Academic Assessment in reading, writing, science, and social studies. The technology leverages natural language processing technology similar to artificial intelligence chatbots such as GPT-4, which the state agency previously spent hiring human scorers through third-party contractors. This could save approximately $15 million to $20 million annually.
The change comes after the STAAR test, which measures students' understanding of the state-mandated core curriculum, was redesigned in 2023. Tests now include fewer multiple-choice questions and more open-ended questions called constructed-response items. After the redesign, response items were constructed six to seven times more. “We wanted to keep as many open-ended responses as possible, but grading them is incredibly time-consuming,” said Jose Rios, director of student assessment for the Texas Education Agency. Rios said TEA hired about 6,000 temporary scorers in 2023 but will need fewer than 2,000 this year.
To develop the scoring system, TEA collected 3,000 responses that went through two rounds of human scoring. From this field sample, an automated scoring engine is programmed to learn the characteristics of the response and assign the same score that a human would give. This spring, when students complete the test, the computer will score every answer they originally created. Her quarter of the answers are then rescored by a human. If the computer-assigned score is “unreliable,” those responses are automatically reassigned to a human. The same thing happens when a computer encounters a type of response that programming doesn't recognize, such as a lot of slang or words from a language other than English. “In addition to 'unreliable' scores and answers that don't fit the computer's programming, random samples of answers are also automatically passed to humans to check the computer's behavior,” Peters points out. Although similar to ChatGPT, TEA officials have resisted suggestions that his scoring engine is artificial intelligence. They note that this process does not “learn” from responses and always follows the original program set by the state.