Monday, September 29, 2008

What the Bluegrass Institute Doesn't Seem to Know about the ACT

There is no way to sugar coat this.
Somebody doesn't know what he's talking about
- and it's not Oldham.

Well, Ben Oldham and I got called out by Richard Innes of the Bluegrass Institute the other day for "ignoring" the ACT benchmarks scores - apparently the holy grail of assessment in his mind.

Of course, this is all part of a larger conversation about Senate Bill 1 and the on-going Task Force on Assessment at KDE.

I wasn't planning on getting into all of this this fall. It's tedious, inside baseball kind of stuff. But the fundamentals are still the same. First, it's just a test. Second, every test has been designed to do a specific job. If test designers wanted a test to do something else, they would begin with that fact in sight. Third, have I mentioned it's just a test?

OK, let's talk about the ACT.

Here's the problem with the American College Test: Nothing, really.

It is a well-designed test intended to help admissions officers at competitive colleges determine which students are most likely to be successful at the university level. Ben Oldham recently went further saying, the "American College Test (ACT) is a highly regarded test developed by a cadre of some of the best measurement professionals in the world and is used by a number of colleges..."

But the ACT is only ONE factor that colleges use to make such determinations.

Why?

I mean, if the ACT can predict success in life, as Innes un-credibly argues (below), why don't colleges simply rely on it and quit wasting time compiling grade point averages and other data they say they need to made the best choices for their school?

The answer lies in the fact that test data are only reliable up to a point. It's just a test score and it shouldn't be turned into anything more.

As Richard C. Atkinson and Saul Geiser recently pointed out,

the problem with general-reasoning tests like the SAT [and ACT] is their premise: that something as complex as intellectual promise can be captured in a single test and reflected in a single score. It is tempting for admissions officers--and parents, legislators, policymakers and the media--to read more into SAT [and ACT] scores than the numbers can bear. Although measurement experts know that tests are only intended as approximations, the fact that scores frequently come with fancy charts and tables can create an exaggerated sense of precision.

And such exaggerations persist.

Newspapers and bloggers rank scores that ought not be ranked - because people like rankings. Some "think tanks" act as though test scores equal "truth" and look for any opportunity to twist data into a pre-existing narrative that Kentucky schools are going to hell in a handcart - this, despite trend data to the contrary, about which they are in full denial.

Georgetown College Distinguished Service Professor Ben Oldham correctly warned that, "since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools. His excellent article clarifies what the test is, and what it isn't.

The problem of over-interpretation has been somewhat exacerbated by the inclusion of benchmark scores in the ACT. But benchmarking does not change the construction of the test nor the norming procedures. It does not turn the ACT into a criterion-referenced exam as Innes tries to suggest - unless all one means by "criterion" is that the ACT derived a cut score. Under that definition a People Magazine Celebrity Quiz could be considered criterion. Socre 18 and you're a Hollywood Insider!

The ACT's "criteria" simply does not measure how well Kentucky students are accessing the curriculum. It is much more sensitive to socio-economic factors attributable to most of the college-going population.

Using a "convenience sample" of schools (those willing to participate) the ACT looks at student success in particular college courses; and then looks at the ACT scores obtained by "successful" students. But regardless of what data such a design produces, "there is no guarantee that it is representative of all colleges in the U.S." Further the ACT "weighted the sample so that it would be representative of a wider variety of schools in terms of their selectivity."

That is to say, they tried to statistically adjust the data produced by the sample to account for more highly selective schools as well as the less selective. This process of weighting data to produce a score that the original sample did not produce should be viewed suspiciously. It would be like...oh, let's say like....using a concordance table to give students a score on a test they didn't take.

If KDE had done anything like this, Innes' buddies at BGI would be crying "fraud."

If we are going to test, and if our tests are going to be used to determine placement in programs within schools, and eventually in college, then we need to understand what the ACT means when it says "college-ready." And we don't. The most important flaw of the ACT benchmarks is conceptual: What is "readiness" for higher education?

As one delves deeped into the statistics other problems arise. Skip Kifer who serves on the Design and Analysis Committee for NAEP told KSN&C,

The benchmark stuff is statistically indefensible. Hierarchical Linear Modeling
(HLM) was invented because people kept confusing at what level to model things
and how questions were different at different levels. The fundamental
statistical flaw in the benchmark ... is that it ignores institutions. Students
are part of institutions and should be modeled that way.

But the ACT models at the "student" level when it should be modeling at the "students nested within institutions" level.

It is possible that the ACT took a kind of average of those "correct" models but that can not be determined that from their Technical Report.

Perhaps Innes could help us understand: How is it that the ACT's benchmarks could have been empirically defined and yet managed to get the same relationship for the University of Kentucky and Lindsey Wilson College?

Unfortunatley, the ACT folks did not respond an inquiry from KSN&C.

But none of this will likely stop the exaggeration of the ACT's abilities.

In response to a KSN&C posting of Ben Oldham's article, Innes made the following claim:

Oldham pushes out-of-date thinking that the ACT is only a norm-referenced test. The ACT did start out more or less that way, years ago, but the addition of the benchmark scores, which are empirically developed from actual college student performance to indicate a good probability of college success, provides a criterion-referenced element today, as well.

"Criterion-referenced element?!" A cut score? The ACT is a timed test too - but that doesn't make it a stopwatch.

So, Oldham is old fashioned and out-of-date? Au contraire. It is Innes who is over-reaching.

Innes argues,

the ACT says that many employers for ... better paying jobs now want exactly the same skills that are needed to succeed in college. So, the Benchmark scores are more like measures of what is needed for a decent adult life. Thus, it isn’t out of line to say that the Benchmarks can fairly be considered a real measure of proficiency. And, that opens the door to compare the percentages of students reaching EXPLORE and PLAN benchmarks to the percentages that CATS says are Proficient or more.

Bull.

One could derive as much "proficiency" evaluating Daddy's IRS form 1040 and then comparing percentages of students reaching EXPLORE and PLAN benchmarks to the likelihood of owning a BMW or affording cosmetic surgery.

I'm afraid what we have here is something other than a nationally recognized assessment expert who is out-of-date.

We have a pundit who thinks the ACT benchmarks constitute a criterion-referenced assessment of the performance of Kentucky students and their prospects for a decent adult life!? This, absent any connection between the ACT and Kentucky's curriculum beyond pure happenstance. There is no relationship between a student's ACT score and any specified subject matter - which is typically the definition of a criterion-referenced test.

There is no way to sugar coat this. Somebody doesn't know what he's talking about - and it's not Oldham.

The best spin I can put on this is that Innes got snookered by ACT's marketing department, which seems to do a fine job, but has been known to overstate the abilities of ACT's EPAS system.

But none of this makes the ACT a bad test. It just means that assessment experts have to take care to understand the nature of the exams and not to rely on them to do too much.

And it is commendable that Kentucky is working toward building an actual relationship between Kentucky's curriculum and that of the ACT through the development of content tests. That work will get Innes closer to to where he wants to be. He should wait for the actual work to be done before making claims.

Just as Atkison, Geiser, Oldham, Kifer, Sexton and virtually everybody else says, the results should not be over-interpreted to suggest relationships that just aren't there. And trying to argue causal chains that are completely unproven is certainly not best practice.

But more to the point, Kentucky recently committed to use the ACT's EPAS system including EXPLORE and PLAN as yet another measure - a norm-reference measure - of student performance. As long as Kentucky is cognizant of the test's limitations we ought to strengthen the connections between Kentucky high schools and universities and gauge student readiness for college. It was because of the large numbers of college freshmen in need of developmental courses that CPE pushed for the ACT/EPAS system to begin with.

Kifer wonders why Kentucky's Advanced Placement (AP) Tests receive so little attention. After all, unlike the ACT, the AP tests are a direct measure of a high school student's ability to do college work; AP courses are particularly well-defined; the tests exist across the curriculum; good AP teachers abound; course goals and exams are open to scrutiny.

When a high schooler passes an AP test he or she not only knows what it means, but the school of their choice gives them college credit for their effort.

Aware of CPE's commitment to the ACT as one measure of student readiness, KSN&C contacted newly named Senior Vice President of the Lumina Foundation Jim Applegate, who until recently served as CPE's VP for Academic Affairs.

Here's what Jim had to say:

Richard,

The article recently referenced in your publication from the admissions officer group addresses the use of ACT for college admissions. The
organizations sponsoring assessments such as ACT, SAT, and others have made clear that no single standardized test should be used to make such decisions. Postsecondary institutions, to implement best practice, should use a
multi-dimension assessment to make admissions decisions. A test score may play a
role in these decisions, but not the only role.

Kentucky uses the ACT/EPAS system (the Explore, Plan, and ACT tied to ACT ‘s College Readiness Standards) to help determine college readiness, place students in the right high school courses to prepare them for college, and place them in the right courses once they go to college. Kentucky’s revised college readiness standards are
about placement, not admission. For the foreseeable future, the postsecondary
system will, as it has always done, accept large numbers of students with ACT
scores below readiness standards, but will provide developmental educational
services to these students to get them ready for college-level work. The large
number of underprepared students coming into Kentucky’s postsecondary system led the Council a couple of years ago to initiate an effort to improve developmental
education order to make sure these students receive the help they need to
succeed in college.

A growing number of states are adopting the ACT or the entire EPAS system to more effectively address the challenge of getting more high school graduates ready for college or the skilled workplace (e.g., Colorado, Illinois, and Michigan). These states also want to better understand the performance of their students in a national and international context. Globalization no longer allows any state’s educational
system to remain isolated from these contexts.

The use of ACT/EPAS is, of course, only one necessary strategy to improve the college/workplace readiness of Kentucky’s traditional and adult learners. Kentucky is working to implement statewide placement tests in mathematics, reading, and English that will be administered to high school students who fall below statewide college readiness benchmarks tied to ACT scores (few states have gotten this far in
clarifying standards to this level). These placement tests will provide more
finely grained information about what students need to know to be ready for
college-level work. We are also working to more strongly integrate college
readiness goals into our teacher preparation and professional development
programs to ensure teachers know how to use the assessments beginning in middle
school to bring students to readiness standards.

The postsecondary system is hopeful the implementation of ACT/EPAS will promote partnerships between postsecondary and high/middle schools to improve student achievement. Some of that has already begun since the first administration of the EPAS college readiness system. For the first time in my time in Kentucky (I grew up
here and returned to work here in 1977) we now know where every 8th grader is on
the road to college readiness thanks to the administration of the Explore. If in
five years the number of students needing developmental education is not
significantly less than it is today then shame on all of us.

Jim Applegate

All of this reminds me of the old Crest Toothpaste disclaimer I read daily while brushing my teeth over the decades.

Crest has been shown to be an effective decay preventive dentifrice that can be of significant value when used as directed in a conscientiously applied program of oral hygiene and regular professional care.

Let's see if I can paraphrase:

The ACT/EPAS system has been shown to be an effective norm-reference assessment that can be of significant value when used as directed in a conscientiously applied assessment program based on clear curriculum goals, direct assessments of specific curriculum attainment and effective instruction from a caring professional.

2 comments:

Richard Innes said...

This extensive KSN & C blog may set a new record for length. I don’t have the time to address all the issues – and I doubt most readers would want to read that, anyway. I’ll just touch on a few points since KSN & C says the Bluegrass Institute doesn’t seem to know anything.

KSN & C certainly wants to put the ACT under a microscope. That’s acceptable, but in fairness let’s apply the same level of examination to CATS, the issue I think lies hidden inside the reference to the “…larger conversation about Senate Bill 1….”

Even KSN & C admits the ACT, “…is a well-designed test intended to help admissions officers….” Jim Applegate’s (former Council on Postsecondary Education Vice President for Academic Affairs) letter quoted at the end of the KSN & C post says that is true for Kentucky’s schools.

However, there are plenty of questions about whether CATS can make similar claims. KSN & C itself has commented in the past about the dubious concordance tables used in CATS for the past two years. KSN & C even projected the obvious inflation in the CATS scoring that occurred in 2007. I don’t see such charges of score manipulation being leveled by anyone against the ACT.

Furthermore, there is continuing disagreement about what CATS is really supposed to do.

Is the primary CATS function to assess student learning, or is the purpose to act as a lever to drive better instruction into schools? Can it do both? Can it really do either?

And, are the CATS standards really aligned with what students truly need? If they are, then why wouldn’t they agree with the ACT standards “…beyond pure happenstance…?” If CATS isn’t well-aligned to the ACT, then what is it aligned to, and how valid are the standards that CATS is aligned to?

With more than half of each graduating class in Kentucky now going on to higher education, these aren’t trivial questions.

Even if ACT only has a limited purpose, at least that purpose is well agreed upon and is highly relevant to a majority of Kentucky’s students. The purpose of CATS remains highly unsettled more than 15 years after the inception of reform assessments in Kentucky.

Continuing the let’s be fair discussion, the CATS is also “…just a test....” However, the consequences for schools and instruction, and thus ultimately for students, are very dramatic. Do we put far too much weight on this single assessment? Is it really testing what is relevant to the student’s long-term needs? In fact, is it nearly as relevant to student needs as the ACT?

KSN & C goes to considerable length to criticize the ACT Benchmark Scores, almost seeming annoyed with them. The bottom line here is a question about whether or not norm-referenced tests are evolving into something new – a hybrid with characteristics of both norm-reference and criterion-reference testing. Testing experts like Dr. George Cunningham, a recently retired professor from U of L, make the point that tests like the ACT are indeed evolving. KSN & C may not want to accept that, but it doesn’t make the change any less real.

KSN & C alleges that I say the ACT Benchmark Scores work exactly the same for any college, be it UK or Lindsey Wilson College. I never said that. It is obvious that the ACT Benchmarks deal with an average across all schools. The ACT’s own publications, including one KSN & C references in the main blog (http://www.act.org/research/policymakers/pdf/benchmarks.pdf), make that clear, saying, “The Benchmarks represent a criterion for success for a typical student at a typical college.” I don’t understand KSN & C’s argument on this.

On a different topic, in a past blog (http://theprincipal.blogspot.com/2007/11/great-yardstick-debate.html), KSN & C defends and praises a study created by the Kentucky Long Term Policy Research Center (KLTPRC). Even in the current blog we discuss here, KSN & C favorably links to an earlier KSN & C discussion of a David Adkisson Op-Ed citing the KLTPRC report (http://theprincipal.blogspot.com/2007/11/kentuckys-schools-new-reality.html).

That KLTPRC Policy Notes # 23 report ranks Kentucky in a number of different test data sets, including the ACT (a ranking the ACT folks actively discourage, by the way).

Now, however, the current KSN & C blog item claims newspapers and bloggers “…ought not to rank…” these test data sets.

I have written on about the inappropriate nature of many state rankings on tests, including an extensive discussion of the problems with the rankings in the KLTPRC report. Is KSN & C now changing its past position on test rankings from the KLTPRC to agree with me? If so, good. That is progress.

On that note, readers who want to learn more can contact me through the Bluegrass Institute for Public Policy Solutions.

The Principal said...

“KSN & C says the Bluegrass Institute doesn’t seem to know anything.” No. Please don’t overstate my claims either. But, I do suggest BGI doesn’t seem to know some things.

“KSN & C certainly wants to put the ACT under a microscope.” Only because BGI has made some exaggerated claims, and that’s today’s topic.

As for CATS, “KSN & C even projected the obvious inflation in the CATS scoring that occurred in 2007.” Close, but no cigar. My suspicions about the KCCT becoming easier had nothing to do with creative scoring mechanisms. Rather, I started hearing from my students (who were all practicing teachers at the time) who told me they thought the test was easier the minute they started administering it. Now, whether one monkeys with the test; or monkeys with the scoring procedures, you can end up in the same place. But the remedies are different, and it’s important to fix what’s broken.
“Is the primary CATS function to assess student learning, or is the purpose to act as a lever to drive better instruction into schools? Can it do both? Can it really do either?” The CATS was created to provide the systemic accountability demanded by the business community in 1990, which is part of the problem. Best, is a system that begins with the students; then the curriculum; formative assessments; adjusted instruction; summative assessment; and finally annual accountability assessment which to be fair to students and teachers alike must be directly related to those classroom goals described in the curriculum.

“And, are the CATS standards really aligned with what students truly need?” According to the Kentucky Supreme Court - yes.


“Even if ACT only has a limited purpose, at least that purpose is well agreed upon and is highly relevant to a majority of Kentucky’s students.” The ACT has some relevance as a discriminator. Beyond that, its construction makes it unresponsive as a measurement of schooling. To do that, one would need a test that measures the specific curriculum students were taught. Still, the ACT provides useful information.
“The purpose of CATS remains highly unsettled more than 15 years after the inception of reform assessments in Kentucky.” Not really. The purpose is spelled out in law, as I recall. The problem is that its purpose is not what a substantial number of Kentuckians want. Many parents want to know how their children compare to other kids and they don’t care so much about the accountability measures that are so much of the CATS.

“KSN & C goes to considerable length to criticize the ACT Benchmark Scores, almost seeming annoyed with them.” Annoyed at the exaggerated value some folks have tried to place on them, perhaps; or possibly, that I got drawn into this discussion to begin with. : )

“Testing experts like Dr. George Cunningham… make the point that tests like the ACT are indeed evolving. KSN & C may not want to accept that, but it doesn’t make the change any less real.” I was surprised by Cunningham’s testimony on Senate Bill 1 and can only say that I believe he also over-stated the ACT’s case.

“KSN & C alleges that I say the ACT Benchmark Scores work exactly the same for any college, be it UK or Lindsey Wilson College.” No. I merely asked a rhetorical question intended to show that the relationships were not properly nested, nor were they empirically derived.

“The Benchmarks represent a criterion for success for a typical student at a typical college.” “Criterion for success?” Sounds great. Now tell me again. What is that criteria? Maybe American schools can start teaching it. But wait - if that happened – I mean if the ACT could identify those critical criteria, and let American schools know what they were and say, in a few years, more and more schools taught it, and students got more and more of those questions correct on the ACT – What would the ACT do? Why, they’d drop those test questions because they would no longer discriminate! So tell me again. What is the criteria?

“KSN & C defends and praises a study created by the Kentucky Long Term Policy Research Center (KLTPRC).” Almost. I have certainly cited it - repeatedly. I do that because, despite its imperfections, it comes to roughly the same conclusion as two or three other studies that have shown Kentucky’s progress since the early 90s. Kentucky ranking somewhere around 34th in student achievement is good news. These reports collectively serve to confirm what I believe to be true from my experience.
“Is KSN & C now changing its past position on test rankings from the KLTPRC to agree with me? If so, good. That is progress.” If I have a past position on rankings, point it out to me and I’ll go read it. But I don’t remember writing on it - or defending KLTPRCs rankings specifically. Having said that, I’m sure you are correct that it is inappropriate to rank ACT scores.