Kentucky School News and Commentary: Robo-graders like ETS’s E-Rater aren’t good enough yet.

Sunday, May 13, 2012

Robo-graders like ETS’s E-Rater aren’t good enough yet.

Standardized tests will finally ask good essay questions.

But robot grading threatens that progress.

This from Dana Goldstein in Slate:

In 2002, Indiana rolled out computer scoring of its 11^th grade state writing exam. At the time, ETS, the company that developed Indiana’s software, said automatic writing assessment could help cut the state’s testing budget in half. But by 2007, Indiana had abandoned the practice.

Why? Though ETS’s E-Rater proved adept at scoring so-called “naked” essays based only on personal opinion, it couldn’t reliably handle questions that required students to demonstrate knowledge from the curriculum. State testing officials tried making lists of keywords the software could scan for: in history, for example, “Queen Isabella,” “Columbus,” and “1492.” But the program didn’t understand the relationship between those items, and so would have given full credit to a sentence like, “Queen Isabella sailed 1,492 ships to Columbus, Ohio.” Cost and time savings never materialized, because most tests also had to be looked at by human graders.

Indiana’s experience is worth keeping in mind, since although the technology has not advanced dramatically over the past decade, we’re now in the midst of a new whirlwind of enthusiasm about electronic writing assessment. Last month, after a study from Mark Shermis of the University of Akron announced that computer programs and people award student-writing samples similar grades, an NPR headline teased, “Can a Computer Program Grade Essays As Well As a Human? Maybe Even Better, Study Says.” Education technology entrepreneur Tom Vander Ark, who co-directed the Shermis study, hailed the results as proof that robo-grading is “fast, accurate, and cost-effective.”

He is right about “fast”: E-Rater can reportedly grade 16,000 essays in 20 seconds. But “accurate” and “cost-effective” are debatable, especially if we want students to write not only about what they think and feel, but also about what they know. Testing companies acknowledge it is easy to game the current generation of robo-graders: Such software rewards longer word counts, unusual vocabulary, transition words such as “however” and “therefore,” and grammatical sentences—whether or not the facts contained within the sentences are correct. To address these problems, the Hewlett Foundation, which also paid for the Shermis study, is offering a $100,000 prize to the team of computer programmers that can make the biggest strides in improving the technology.

The recent push for automated essay scoring comes just as we’re on the verge of making standardized essay tests much more sophisticated in ways robo-graders will have difficulty dealing with. One of the major goals of the new Common Core curriculum standards, which 45 states have agreed to adopt, is to supplant the soft-focus “personal essay” writing that currently predominates in American classrooms with more evidence-driven, subject-specific writing. The creators of the Common Core hope machines can soon score these essays cheaply and quickly, saving states money in a time of harsh education budget cuts. But since robo-graders can’t broadly distinguish fact from fiction, adopting such software prematurely could be antithetical to testing students in more challenging essay-writing genres...

1 comment:

Anonymous said...: If we are using computers to evaluate human writing, then why are we even learning how to write? Why not just have the computer do it for you. Just speak to the computer and it can translate it into a well written sentence.

Example

Uneducated illiterate non writer:
"It was really bitch'en but it sucked too."

Computer translation:
"It was the best of times, it was the worst of times."

I just don't understand that we expect human teachers to instruct students but have computer programs evaluate if students learned and instruction occurred. Maybe I am a member of a dying profession?!?!? Soon to be replaced by some automated, computerized machine.; May 14, 2012 at 9:20 PM

KSN&C

KSN&C is intended to be a place for well-reasoned civil discourse...not to suggest that we don’t appreciate the witty retort or pithy observation. Have at it. But we do not invite the anonymous flaming too often found in social media these days. This is a destination for folks to state your name and speak your piece.

It is important to note that, while the Moderator serves as Faculty Regent for Eastern Kentucky University, all comments offered by the Moderator on KSN&C are his own opinions and do not necessarily represent the views of the Board of Regents, the university administration, faculty, or any members of the university community.

On KSN&C, all authors are responsible for their own comments. See full disclaimer at the bottom of the page.

I moved, in 1985, from suburban northern Kentucky to what was then the state’s flagship district - Fayette County. I have had a unique set of experiences to accompany my journey through KERA’s implementation. I have seen children grow to graduate and lead successful lives. I have seen them go to jail and I have seen them die. I have been amazed by brilliant teachers, dismayed by impassive bureaucrats, disappointed by politicians and uplifted by some of Kentucky’s finest school children. When I am not complaining about it, I will attest that public school administration is critically important work.

Kentucky School News and Commentary

Sunday, May 13, 2012

Robo-graders like ETS’s E-Rater aren’t good enough yet.

Standardized tests will finally ask good essay questions.

But robot grading threatens that progress.

1 comment:

Faculty Senate Chair

Teaching

Professin'

Faculty Regent