Computer-grading of essays gaining a foothold

By Del Stover

5/17/05 -- State education officials are beginning to turn to computers to grade essays on state exams, a high-tech development that’s raising some eyebrows among educators.

At the forefront of this movement is West Virginia, where an “automated essay-scoring program” is helping grade the writing samples of 44,000 middle and high school students who took the state’s writing assessment this spring.

Also on the cutting edge of this technology is Indiana, where a pilot program has relied on computers to grade essays on a state-mandated English exam for 11th graders. This year, Michigan hopes to launch its own pilot program.

To date, no one is relying on computer-generated scores to determine student grades or a school’s adequate yearly progress. But it’s clear state education officials are looking at the potential of this technology to limit the need for costly human scorers -- and reduce the time needed to grade tests and get them back in the hands of classroom teachers.

The technology is gaining acceptance after several commercial products began showing consistent levels of sophistication, technology experts say. Once limited to spotting simple spelling errors, these highly advanced programs now can identify most grammar mistakes and analyze sentence structure and essay organization.

Vantage Learning, a leading developer of essay-scoring software, boasts that scores developed by its IntelliMetric system match those of human essay scorers more than 96 percent of the time.

That’s good enough for ACT Inc., which just signed an agreement with Vantage Learning to evaluate essays on the ACT’s Graduate Management Admission Test, used by colleges to determine the academic skill level of incoming students.

“In terms of their capability to evaluate writing, at this point I think the programs are hitting their stride,” says John Kalohn, director of placement programs at ACT Inc. “Do they have some limitations? Yeah, they probably do. But they tend to work with essay responses very well, and that’s why we’re using them.”

The sophistication of these programs was recently demonstrated to School Board News by Educational Testing Service’s Linda Reitzel, a program manager for ETS’s Criterion online writing evaluation system. Designed as an instructional/diagnostic tool for teachers, Criterion revealed an impressive ability to evaluate writing samples in detail -- and offer advice on how to improve the essays. And it completed the work in seconds.

In such a limited role, under the supervision of teachers, these computer programs are an invaluable classroom aid and unlikely to generate controversy. Indeed, these products are gaining a strong foothold in schools across the nation. Already, for example, Criterion is used in more than 2,000 schools.

Exploring the potential use of such software actually is a major goal of the Michigan pilot program, says Jamey Fitzpatrick, interim president of Michigan Virtual University, which is overseeing the project.

“As these programs become more and more stable, and more and more mature, they are going to be able to provide an added resource to the classroom teacher,” he says. “Let’s face it. It’s a very labor-intensive task to sit down and read essays.”

What worries some educators is that, given the labor-intensive and costly nature of reading essays, state officials will be tempted to begin using these programs to grade high-stakes tests. And state officials admit that possibility certainly will be on the table in the years ahead.

Nancy Patterson, chair of the reading language arts program at Grand Valley State University, opposes any move in that direction. Essay-scoring programs remain limited, she says, and their promised level of performance is revealed only when scoring a rigidly structured writing sample, such as the typical five-paragraph essay seen on many tests.

No computer can yet evaluate subtle or creative styles of writing, and they cannot judge the quality of an essay’s intellectual content, she says. A recent experiment, she notes, revealed that one computer program gave a high score to an essay that consisted of a paragraph taken from a Stephen King novel and repeated over and over again.

Relying on computers to score high-stakes tests, Patterson says, will only give teachers further incentive to teach a conservative and formulaic writing style which the software is likely to give a high score. That would be “extraordinarily misguided,” she says. “The problem is that what gets graded gets measured, and ultimately that’s not good for students.”

The limitations of these programs are not lost on state education officials. In West Virginia, the accuracy of computer scoring has been evaluated repeatedly during the state’s pilot project, and a special committee will review a sample selection of this year’s essay scores to validate the process.

In Indiana, the computer software will assign a confidence rating to each essay, and if the program indicates it had problems with a writing sample, it will be referred automatically to a human scorer, says Wes Bruce, the state’s assistant superintendent for assessment, research, and information technology. A sampling of essay results also will be reviewed as a quality-control measure.

For now, the question has yet to be answered whether essay-scoring software is ready for use with high-stakes testing. But the answer might be coming with the next generation of scoring software in the development pipeline. ETS, for example, is working on a new program that company officials say will have the capability to judge an essay’s content as well as grammar.

That’s going to be a tempting product for state officials, who say they’re eager to find a way to shorten the turnaround time necessary to get critical test results back to teachers.

Indeed, that’s one reason that West Virginia embraced computerized scoring for its statewide writing assessments, says Brenda West, assistant director of the Office of Student Assessment Services.

“In the past, we brought in teachers in the summer to do the scoring, so it was July before the assessment results were completed,” she says. Using computers, “we hope to have the score reports to students and schools in May, so teachers will have an opportunity to look at the scores and at least do a little work with students before school is out.”

Reproduced with permission from School Board News. Copyright © 2005, National School Boards Association. Opinions expressed in this newspaper do not necessarily reflect positions of NSBA. This article may be printed out and photocopied for individual or educational use, provided this copyright notice appears on each copy. This article may not be otherwise transmitted or reproduced in print or electronic form without the consent of the Publisher. For more information, call (703) 838-6789.


 
 
Connect With NSBA
 
 
From: 
Email:  
To: 
Email:  
Subject: 
Message: