Showing posts with label Saul Geiser. Show all posts
Showing posts with label Saul Geiser. Show all posts

Monday, September 29, 2008

What the Bluegrass Institute Doesn't Seem to Know about the ACT

There is no way to sugar coat this.
Somebody doesn't know what he's talking about
- and it's not Oldham.

Well, Ben Oldham and I got called out by Richard Innes of the Bluegrass Institute the other day for "ignoring" the ACT benchmarks scores - apparently the holy grail of assessment in his mind.

Of course, this is all part of a larger conversation about Senate Bill 1 and the on-going Task Force on Assessment at KDE.

I wasn't planning on getting into all of this this fall. It's tedious, inside baseball kind of stuff. But the fundamentals are still the same. First, it's just a test. Second, every test has been designed to do a specific job. If test designers wanted a test to do something else, they would begin with that fact in sight. Third, have I mentioned it's just a test?

OK, let's talk about the ACT.

Here's the problem with the American College Test: Nothing, really.

It is a well-designed test intended to help admissions officers at competitive colleges determine which students are most likely to be successful at the university level. Ben Oldham recently went further saying, the "American College Test (ACT) is a highly regarded test developed by a cadre of some of the best measurement professionals in the world and is used by a number of colleges..."

But the ACT is only ONE factor that colleges use to make such determinations.

Why?

I mean, if the ACT can predict success in life, as Innes un-credibly argues (below), why don't colleges simply rely on it and quit wasting time compiling grade point averages and other data they say they need to made the best choices for their school?

The answer lies in the fact that test data are only reliable up to a point. It's just a test score and it shouldn't be turned into anything more.

As Richard C. Atkinson and Saul Geiser recently pointed out,

the problem with general-reasoning tests like the SAT [and ACT] is their premise: that something as complex as intellectual promise can be captured in a single test and reflected in a single score. It is tempting for admissions officers--and parents, legislators, policymakers and the media--to read more into SAT [and ACT] scores than the numbers can bear. Although measurement experts know that tests are only intended as approximations, the fact that scores frequently come with fancy charts and tables can create an exaggerated sense of precision.

And such exaggerations persist.

Newspapers and bloggers rank scores that ought not be ranked - because people like rankings. Some "think tanks" act as though test scores equal "truth" and look for any opportunity to twist data into a pre-existing narrative that Kentucky schools are going to hell in a handcart - this, despite trend data to the contrary, about which they are in full denial.

Georgetown College Distinguished Service Professor Ben Oldham correctly warned that, "since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools. His excellent article clarifies what the test is, and what it isn't.

The problem of over-interpretation has been somewhat exacerbated by the inclusion of benchmark scores in the ACT. But benchmarking does not change the construction of the test nor the norming procedures. It does not turn the ACT into a criterion-referenced exam as Innes tries to suggest - unless all one means by "criterion" is that the ACT derived a cut score. Under that definition a People Magazine Celebrity Quiz could be considered criterion. Socre 18 and you're a Hollywood Insider!

The ACT's "criteria" simply does not measure how well Kentucky students are accessing the curriculum. It is much more sensitive to socio-economic factors attributable to most of the college-going population.

Using a "convenience sample" of schools (those willing to participate) the ACT looks at student success in particular college courses; and then looks at the ACT scores obtained by "successful" students. But regardless of what data such a design produces, "there is no guarantee that it is representative of all colleges in the U.S." Further the ACT "weighted the sample so that it would be representative of a wider variety of schools in terms of their selectivity."

That is to say, they tried to statistically adjust the data produced by the sample to account for more highly selective schools as well as the less selective. This process of weighting data to produce a score that the original sample did not produce should be viewed suspiciously. It would be like...oh, let's say like....using a concordance table to give students a score on a test they didn't take.

If KDE had done anything like this, Innes' buddies at BGI would be crying "fraud."

If we are going to test, and if our tests are going to be used to determine placement in programs within schools, and eventually in college, then we need to understand what the ACT means when it says "college-ready." And we don't. The most important flaw of the ACT benchmarks is conceptual: What is "readiness" for higher education?

As one delves deeped into the statistics other problems arise. Skip Kifer who serves on the Design and Analysis Committee for NAEP told KSN&C,

The benchmark stuff is statistically indefensible. Hierarchical Linear Modeling
(HLM) was invented because people kept confusing at what level to model things
and how questions were different at different levels. The fundamental
statistical flaw in the benchmark ... is that it ignores institutions. Students
are part of institutions and should be modeled that way.

But the ACT models at the "student" level when it should be modeling at the "students nested within institutions" level.

It is possible that the ACT took a kind of average of those "correct" models but that can not be determined that from their Technical Report.

Perhaps Innes could help us understand: How is it that the ACT's benchmarks could have been empirically defined and yet managed to get the same relationship for the University of Kentucky and Lindsey Wilson College?

Unfortunatley, the ACT folks did not respond an inquiry from KSN&C.

But none of this will likely stop the exaggeration of the ACT's abilities.

In response to a KSN&C posting of Ben Oldham's article, Innes made the following claim:

Oldham pushes out-of-date thinking that the ACT is only a norm-referenced test. The ACT did start out more or less that way, years ago, but the addition of the benchmark scores, which are empirically developed from actual college student performance to indicate a good probability of college success, provides a criterion-referenced element today, as well.

"Criterion-referenced element?!" A cut score? The ACT is a timed test too - but that doesn't make it a stopwatch.

So, Oldham is old fashioned and out-of-date? Au contraire. It is Innes who is over-reaching.

Innes argues,

the ACT says that many employers for ... better paying jobs now want exactly the same skills that are needed to succeed in college. So, the Benchmark scores are more like measures of what is needed for a decent adult life. Thus, it isn’t out of line to say that the Benchmarks can fairly be considered a real measure of proficiency. And, that opens the door to compare the percentages of students reaching EXPLORE and PLAN benchmarks to the percentages that CATS says are Proficient or more.

Bull.

One could derive as much "proficiency" evaluating Daddy's IRS form 1040 and then comparing percentages of students reaching EXPLORE and PLAN benchmarks to the likelihood of owning a BMW or affording cosmetic surgery.

I'm afraid what we have here is something other than a nationally recognized assessment expert who is out-of-date.

We have a pundit who thinks the ACT benchmarks constitute a criterion-referenced assessment of the performance of Kentucky students and their prospects for a decent adult life!? This, absent any connection between the ACT and Kentucky's curriculum beyond pure happenstance. There is no relationship between a student's ACT score and any specified subject matter - which is typically the definition of a criterion-referenced test.

There is no way to sugar coat this. Somebody doesn't know what he's talking about - and it's not Oldham.

The best spin I can put on this is that Innes got snookered by ACT's marketing department, which seems to do a fine job, but has been known to overstate the abilities of ACT's EPAS system.

But none of this makes the ACT a bad test. It just means that assessment experts have to take care to understand the nature of the exams and not to rely on them to do too much.

And it is commendable that Kentucky is working toward building an actual relationship between Kentucky's curriculum and that of the ACT through the development of content tests. That work will get Innes closer to to where he wants to be. He should wait for the actual work to be done before making claims.

Just as Atkison, Geiser, Oldham, Kifer, Sexton and virtually everybody else says, the results should not be over-interpreted to suggest relationships that just aren't there. And trying to argue causal chains that are completely unproven is certainly not best practice.

But more to the point, Kentucky recently committed to use the ACT's EPAS system including EXPLORE and PLAN as yet another measure - a norm-reference measure - of student performance. As long as Kentucky is cognizant of the test's limitations we ought to strengthen the connections between Kentucky high schools and universities and gauge student readiness for college. It was because of the large numbers of college freshmen in need of developmental courses that CPE pushed for the ACT/EPAS system to begin with.

Kifer wonders why Kentucky's Advanced Placement (AP) Tests receive so little attention. After all, unlike the ACT, the AP tests are a direct measure of a high school student's ability to do college work; AP courses are particularly well-defined; the tests exist across the curriculum; good AP teachers abound; course goals and exams are open to scrutiny.

When a high schooler passes an AP test he or she not only knows what it means, but the school of their choice gives them college credit for their effort.

Aware of CPE's commitment to the ACT as one measure of student readiness, KSN&C contacted newly named Senior Vice President of the Lumina Foundation Jim Applegate, who until recently served as CPE's VP for Academic Affairs.

Here's what Jim had to say:

Richard,

The article recently referenced in your publication from the admissions officer group addresses the use of ACT for college admissions. The
organizations sponsoring assessments such as ACT, SAT, and others have made clear that no single standardized test should be used to make such decisions. Postsecondary institutions, to implement best practice, should use a
multi-dimension assessment to make admissions decisions. A test score may play a
role in these decisions, but not the only role.

Kentucky uses the ACT/EPAS system (the Explore, Plan, and ACT tied to ACT ‘s College Readiness Standards) to help determine college readiness, place students in the right high school courses to prepare them for college, and place them in the right courses once they go to college. Kentucky’s revised college readiness standards are
about placement, not admission. For the foreseeable future, the postsecondary
system will, as it has always done, accept large numbers of students with ACT
scores below readiness standards, but will provide developmental educational
services to these students to get them ready for college-level work. The large
number of underprepared students coming into Kentucky’s postsecondary system led the Council a couple of years ago to initiate an effort to improve developmental
education order to make sure these students receive the help they need to
succeed in college.

A growing number of states are adopting the ACT or the entire EPAS system to more effectively address the challenge of getting more high school graduates ready for college or the skilled workplace (e.g., Colorado, Illinois, and Michigan). These states also want to better understand the performance of their students in a national and international context. Globalization no longer allows any state’s educational
system to remain isolated from these contexts.

The use of ACT/EPAS is, of course, only one necessary strategy to improve the college/workplace readiness of Kentucky’s traditional and adult learners. Kentucky is working to implement statewide placement tests in mathematics, reading, and English that will be administered to high school students who fall below statewide college readiness benchmarks tied to ACT scores (few states have gotten this far in
clarifying standards to this level). These placement tests will provide more
finely grained information about what students need to know to be ready for
college-level work. We are also working to more strongly integrate college
readiness goals into our teacher preparation and professional development
programs to ensure teachers know how to use the assessments beginning in middle
school to bring students to readiness standards.

The postsecondary system is hopeful the implementation of ACT/EPAS will promote partnerships between postsecondary and high/middle schools to improve student achievement. Some of that has already begun since the first administration of the EPAS college readiness system. For the first time in my time in Kentucky (I grew up
here and returned to work here in 1977) we now know where every 8th grader is on
the road to college readiness thanks to the administration of the Explore. If in
five years the number of students needing developmental education is not
significantly less than it is today then shame on all of us.

Jim Applegate

All of this reminds me of the old Crest Toothpaste disclaimer I read daily while brushing my teeth over the decades.

Crest has been shown to be an effective decay preventive dentifrice that can be of significant value when used as directed in a conscientiously applied program of oral hygiene and regular professional care.

Let's see if I can paraphrase:

The ACT/EPAS system has been shown to be an effective norm-reference assessment that can be of significant value when used as directed in a conscientiously applied assessment program based on clear curriculum goals, direct assessments of specific curriculum attainment and effective instruction from a caring professional.

Thursday, August 14, 2008

Must Read: Beyond The SAT


Best argument yet for a national curriculum.

A very well-reasoned article on testing from Richard C. Atkinson and Saul Geiser in Forbes:
Richard C. Atkinson is president emeritus of the University of California. His February 2001 address to the American Council on Education on standardized testing and the SAT brought national attention to the topic and led to a revision of the test by the College Board.

Saul Geiser is former director of admissions research at the University of California’s Office of the President and currently a research associate at the Center for Studies in Higher Education at the UC Berkeley campus.





It used to be that an acceptance letter from a good college was simply a pleasant prelude to the game of life. No more. In 21st century America, getting into the best universities has become a ferociously competitive, high-stakes game. This year the University of California received 340,000 applications for 40,000 places. There are many more qualified students than selective schools can accommodate, and the hunt is on for the best students at public and private institutions alike.

But who are the best students? American colleges and universities have long answered this question by looking at applicants' high-school grades in academic subjects and their scores on standardized college-entrance tests.

These tests come in two varieties: achievement and general reasoning. Achievement tests measure what students have learned in high-school courses, such as history, math and foreign languages. General-reasoning tests seek to assess students' academic potential by measuring their skills in solving reading and math problems largely, by design, independent of high-school curricula. Since 1926, the dominant general-reasoning test in the U.S. has been the SAT, sponsored by the College Board.

The SAT has a long pedigree in American higher education. Yet the problem with general-reasoning tests like the SAT is their premise: that something as complex as intellectual promise can be captured in a single test and reflected in a single score. It is tempting for admissions officers--and parents, legislators, policymakers and the media--to read more into SAT scores than the numbers can bear. Although measurement experts know that tests are only intended as approximations, the fact that scores frequently come with fancy charts and tables can create an exaggerated sense of precision.

For quite some time, an over-reliance on these scores has skewed the outcome of the admissions game. The more competitive admissions become, the more small differences in SAT scores affect a student's chances. As a result, deserving students, including low-income and minority applicants, are crowded out of the game. These concerns led the University of California to consider eliminating the SAT entirely as a requirement for admission in 2001.

The College Board responded with a revised SAT, introduced in March 2005. The new SAT is a dramatic improvement over the old. The mathematics section is more demanding, but also more fair; while the old SAT featured questions that were known for their trickery but required only a beginning knowledge of algebra, the new math section is more straightforward and covers higher-level math.

Instead of deconstructing esoteric verbal analogies, students must now perform a task they will actually face in college: writing an essay under a deadline. These changes have already galvanized high schools and students to put more effort and attention into writing and college-preparatory math. The new SAT, in other words, has gone a long way toward becoming an achievement test.

But has it gone far enough? The College Board's own recent assessment concludes that the new SAT is not substantially better than the version it replaced in its ability to predict student performance in the first year of college. Although the essay adds significant value to the new SAT, it appears the critical-reading section does not. The new SAT is almost an hour longer than the old SAT. And its content is still not as closely tied to college-preparatory curricula as a true achievement test should be.

The new SAT is looking more like a promising first draft than a final product. Any plans for revision should consider a series of University of California studies that have unsettled some entrenched assumptions about testing students' readiness for college.

The studies, conducted over the past decade, suggest that achievement tests are better than general-reasoning tests in predicting how well students are likely to perform in college, that they are fairer to low-income and minority students, and that they reinforce teaching and learning in a way the SAT--even the new SAT--does not. Achievement tests help students understand where they are strong academically and where they need to improve--and that they can improve if they invest the time and work.

The most intriguing aspect of this research, however, is not what it says about tests but what it says about that old-fashioned admissions criterion, high-school grades. The studies concluded that a student's performance over four years of high school remains the fairest and most meaningful measure of his or her accomplishments and the most reliable indicator of future success in college. We need standardized tests to correct for grade inflation and give students useful feedback. But we must be very careful about the tests we choose, and the California findings give us persuasive reasons to move toward achievement tests.

Like the new SAT, standardized testing is itself a work in progress. We present two possible routes for the future.

The first option is to revise the new SAT to keep the writing and mathematics sections but significantly reduce the critical-reading component. Along with this newer SAT, require students to take two achievement tests of their own choosing: candidates are the SAT Subject Tests and Advanced Placement (AP) exams, both offered by the College Board.

This strategy yields a shorter SAT while preserving its current strength in assessing two indispensable skills for academic success--writing and mathematics. It also tells students that they must be prepared to demonstrate not only an ability to write clearly and think quantitatively, but also mastery of two subject areas.

The second is not to require a single, comprehensive test at all, whether the new SAT or its long-standing rival the ACT. Instead, have students take a combination of achievement tests in various academic subjects, again using the SAT Subject Tests or AP exams, with a choice of at least some of them.

This strategy recognizes a fundamental problem with any effort to develop a national achievement test: the absence of a standardized high-school curriculum in the U.S.
American College Testing, sponsor of the ACT, has sought valiantly to overcome this difficulty through national curriculum surveys, but the ACT does not measure student achievement to the same depth as do discipline-specific tests like the SAT Subject Tests or AP exams. It may be that no single examination, however well designed, will be satisfactory in a country that lacks a national curriculum and has a long tradition of local control.

In the unrelentingly competitive world that college admissions has become, we owe students the chance to be judged on criteria as fair and rigorous as we can make them. The current ferment of research on standardized testing, including several major studies now underway, suggests that we may be on the verge of opening a productive new chapter in the long national conversation on what academic merit is and how it should be measured. One thing is clear: There is still a lot more to say.