Well, Ben Oldham and I got called out by Richard Innes of the Bluegrass Institute the other day for "ignoring" the ACT benchmarks scores - apparently the holy grail of assessment in his mind.
Of course, this is all part of a larger conversation about Senate Bill 1 and the on-going Task Force on Assessment at KDE.
I wasn't planning on getting into all of this this fall. It's tedious, inside baseball kind of stuff. But the fundamentals are still the same. First, it's just a test. Second, every test has been designed to do a specific job. If test designers wanted a test to do something else, they would begin with that fact in sight. Third, have I mentioned it's just a test?
OK, let's talk about the ACT.
Here's the problem with the American College Test: Nothing, really.
It is a well-designed test intended to help admissions officers at competitive colleges determine which students are most likely to be successful at the university level. Ben Oldham recently went further saying, the "American College Test (ACT) is a highly regarded test developed by a cadre of some of the best measurement professionals in the world and is used by a number of colleges..."
But the ACT is only ONE factor that colleges use to make such determinations.
I mean, if the ACT can predict success in life, as Innes un-credibly argues (below), why don't colleges simply rely on it and quit wasting time compiling grade point averages and other data they say they need to made the best choices for their school?
The answer lies in the fact that test data are only reliable up to a point. It's just a test score and it shouldn't be turned into anything more.
As Richard C. Atkinson and Saul Geiser recently pointed out,
the problem with general-reasoning tests like the SAT [and ACT] is their premise: that something as complex as intellectual promise can be captured in a single test and reflected in a single score. It is tempting for admissions officers--and parents, legislators, policymakers and the media--to read more into SAT [and ACT] scores than the numbers can bear. Although measurement experts know that tests are only intended as approximations, the fact that scores frequently come with fancy charts and tables can create an exaggerated sense of precision.
And such exaggerations persist.Newspapers and bloggers rank scores that ought not be ranked - because people like rankings. Some "think tanks" act as though test scores equal "truth" and look for any opportunity to twist data into a pre-existing narrative that Kentucky schools are going to hell in a handcart - this, despite trend data to the contrary, about which they are in full denial.
Georgetown College Distinguished Service Professor Ben Oldham correctly warned that, "since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools. His excellent article clarifies what the test is, and what it isn't.
The problem of over-interpretation has been somewhat exacerbated by the inclusion of benchmark scores in the ACT. But benchmarking does not change the construction of the test nor the norming procedures. It does not turn the ACT into a criterion-referenced exam as Innes tries to suggest - unless all one means by "criterion" is that the ACT derived a cut score. Under that definition a People Magazine Celebrity Quiz could be considered criterion. Socre 18 and you're a Hollywood Insider!
The ACT's "criteria" simply does not measure how well Kentucky students are accessing the curriculum. It is much more sensitive to socio-economic factors attributable to most of the college-going population.
Using a "convenience sample" of schools (those willing to participate) the ACT looks at student success in particular college courses; and then looks at the ACT scores obtained by "successful" students. But regardless of what data such a design produces, "there is no guarantee that it is representative of all colleges in the U.S." Further the ACT "weighted the sample so that it would be representative of a wider variety of schools in terms of their selectivity."
That is to say, they tried to statistically adjust the data produced by the sample to account for more highly selective schools as well as the less selective. This process of weighting data to produce a score that the original sample did not produce should be viewed suspiciously. It would be like...oh, let's say like....using a concordance table to give students a score on a test they didn't take.
If KDE had done anything like this, Innes' buddies at BGI would be crying "fraud."
If we are going to test, and if our tests are going to be used to determine placement in programs within schools, and eventually in college, then we need to understand what the ACT means when it says "college-ready." And we don't. The most important flaw of the ACT benchmarks is conceptual: What is "readiness" for higher education?
As one delves deeped into the statistics other problems arise. Skip Kifer who serves on the Design and Analysis Committee for NAEP told KSN&C,
The benchmark stuff is statistically indefensible. Hierarchical Linear Modeling
(HLM) was invented because people kept confusing at what level to model things
and how questions were different at different levels. The fundamental
statistical flaw in the benchmark ... is that it ignores institutions. Students
are part of institutions and should be modeled that way.
But the ACT models at the "student" level when it should be modeling at the "students nested within institutions" level.
It is possible that the ACT took a kind of average of those "correct" models but that can not be determined that from their Technical Report.
Perhaps Innes could help us understand: How is it that the ACT's benchmarks could have been empirically defined and yet managed to get the same relationship for the University of Kentucky and Lindsey Wilson College?
Unfortunatley, the ACT folks did not respond an inquiry from KSN&C.
But none of this will likely stop the exaggeration of the ACT's abilities.
In response to a KSN&C posting of Ben Oldham's article, Innes made the following claim:
Oldham pushes out-of-date thinking that the ACT is only a norm-referenced test. The ACT did start out more or less that way, years ago, but the addition of the benchmark scores, which are empirically developed from actual college student performance to indicate a good probability of college success, provides a criterion-referenced element today, as well.
"Criterion-referenced element?!" A cut score? The ACT is a timed test too - but that doesn't make it a stopwatch.
So, Oldham is old fashioned and out-of-date? Au contraire. It is Innes who is over-reaching.
the ACT says that many employers for ... better paying jobs now want exactly the same skills that are needed to succeed in college. So, the Benchmark scores are more like measures of what is needed for a decent adult life. Thus, it isn’t out of line to say that the Benchmarks can fairly be considered a real measure of proficiency. And, that opens the door to compare the percentages of students reaching EXPLORE and PLAN benchmarks to the percentages that CATS says are Proficient or more.
One could derive as much "proficiency" evaluating Daddy's IRS form 1040 and then comparing percentages of students reaching EXPLORE and PLAN benchmarks to the likelihood of owning a BMW or affording cosmetic surgery.
I'm afraid what we have here is something other than a nationally recognized assessment expert who is out-of-date.
We have a pundit who thinks the ACT benchmarks constitute a criterion-referenced assessment of the performance of Kentucky students and their prospects for a decent adult life!? This, absent any connection between the ACT and Kentucky's curriculum beyond pure happenstance. There is no relationship between a student's ACT score and any specified subject matter - which is typically the definition of a criterion-referenced test.
There is no way to sugar coat this. Somebody doesn't know what he's talking about - and it's not Oldham.
The best spin I can put on this is that Innes got snookered by ACT's marketing department, which seems to do a fine job, but has been known to overstate the abilities of ACT's EPAS system.
But none of this makes the ACT a bad test. It just means that assessment experts have to take care to understand the nature of the exams and not to rely on them to do too much.
And it is commendable that Kentucky is working toward building an actual relationship between Kentucky's curriculum and that of the ACT through the development of content tests. That work will get Innes closer to to where he wants to be. He should wait for the actual work to be done before making claims.
Just as Atkison, Geiser, Oldham, Kifer, Sexton and virtually everybody else says, the results should not be over-interpreted to suggest relationships that just aren't there. And trying to argue causal chains that are completely unproven is certainly not best practice.
But more to the point, Kentucky recently committed to use the ACT's EPAS system including EXPLORE and PLAN as yet another measure - a norm-reference measure - of student performance. As long as Kentucky is cognizant of the test's limitations we ought to strengthen the connections between Kentucky high schools and universities and gauge student readiness for college. It was because of the large numbers of college freshmen in need of developmental courses that CPE pushed for the ACT/EPAS system to begin with.
Kifer wonders why Kentucky's Advanced Placement (AP) Tests receive so little attention. After all, unlike the ACT, the AP tests are a direct measure of a high school student's ability to do college work; AP courses are particularly well-defined; the tests exist across the curriculum; good AP teachers abound; course goals and exams are open to scrutiny.
When a high schooler passes an AP test he or she not only knows what it means, but the school of their choice gives them college credit for their effort.
Aware of CPE's commitment to the ACT as one measure of student readiness, KSN&C contacted newly named Senior Vice President of the Lumina Foundation Jim Applegate, who until recently served as CPE's VP for Academic Affairs.
Here's what Jim had to say:
The article recently referenced in your publication from the admissions officer group addresses the use of ACT for college admissions. The
organizations sponsoring assessments such as ACT, SAT, and others have made clear that no single standardized test should be used to make such decisions. Postsecondary institutions, to implement best practice, should use a
multi-dimension assessment to make admissions decisions. A test score may play a
role in these decisions, but not the only role.
Kentucky uses the ACT/EPAS system (the Explore, Plan, and ACT tied to ACT ‘s College Readiness Standards) to help determine college readiness, place students in the right high school courses to prepare them for college, and place them in the right courses once they go to college. Kentucky’s revised college readiness standards are
about placement, not admission. For the foreseeable future, the postsecondary
system will, as it has always done, accept large numbers of students with ACT
scores below readiness standards, but will provide developmental educational
services to these students to get them ready for college-level work. The large
number of underprepared students coming into Kentucky’s postsecondary system led the Council a couple of years ago to initiate an effort to improve developmental
education order to make sure these students receive the help they need to
succeed in college.
A growing number of states are adopting the ACT or the entire EPAS system to more effectively address the challenge of getting more high school graduates ready for college or the skilled workplace (e.g., Colorado, Illinois, and Michigan). These states also want to better understand the performance of their students in a national and international context. Globalization no longer allows any state’s educational
system to remain isolated from these contexts.
The use of ACT/EPAS is, of course, only one necessary strategy to improve the college/workplace readiness of Kentucky’s traditional and adult learners. Kentucky is working to implement statewide placement tests in mathematics, reading, and English that will be administered to high school students who fall below statewide college readiness benchmarks tied to ACT scores (few states have gotten this far in
clarifying standards to this level). These placement tests will provide more
finely grained information about what students need to know to be ready for
college-level work. We are also working to more strongly integrate college
readiness goals into our teacher preparation and professional development
programs to ensure teachers know how to use the assessments beginning in middle
school to bring students to readiness standards.
The postsecondary system is hopeful the implementation of ACT/EPAS will promote partnerships between postsecondary and high/middle schools to improve student achievement. Some of that has already begun since the first administration of the EPAS college readiness system. For the first time in my time in Kentucky (I grew up
here and returned to work here in 1977) we now know where every 8th grader is on
the road to college readiness thanks to the administration of the Explore. If in
five years the number of students needing developmental education is not
significantly less than it is today then shame on all of us.
All of this reminds me of the old Crest Toothpaste disclaimer I read daily while brushing my teeth over the decades.
Crest has been shown to be an effective decay preventive dentifrice that can be of significant value when used as directed in a conscientiously applied program of oral hygiene and regular professional care.
Let's see if I can paraphrase:
The ACT/EPAS system has been shown to be an effective norm-reference assessment that can be of significant value when used as directed in a conscientiously applied assessment program based on clear curriculum goals, direct assessments of specific curriculum attainment and effective instruction from a caring professional.