Tuesday, October 14, 2008

Kifer on the Problem with an ACT-only Assessment

KSN&C has maintained that what's really wrong with Senate Bill 1 and its suggestion of using the ACT as THE Kentucky test, is that using the ACT doesn't fix the problem. As one part of a comprehensive testing program the ACT may have a place. But alone, it fails.

Sunday, Skip Kifer illustrated the point in the Herald-Leader.


Narrow view of student potential
By Skip Kifer
We old guys remember what the ACT did in its heyday. Along with a student's high school record, ACT scores helped admissions offices decide who had the best chance of succeeding in their institutions. Colleges used ACT scores, along with an array of other information, to place students in courses once they arrived on campus.

Last month, a committee representing the nation's admissions officers released a report about overemphasizing test scores and argued for prudent use of the ACT and the Scholastic Achievement Test in selecting students. The committee suggested that some colleges and universities can enroll students without requiring the tests. This is a recurring theme, not a new one.

What is new is that ACT officials, without having changed the nature of the test, say their scores now tell whether a student "meets expectations" or is "ready" to attend college.

Kentucky's high school juniors are required to take the ACT. Their results, released a couple of weeks ago, were interpreted to mean that too many students failed to "meet expectations" or too few were "ready."

What, we old guys wonder, could these assertions mean?

An ACT score is just that: a score on a test. It tells something about me as a student but means little without a context.

My 18 on the ACT means something very different if I am in the top 10 percent of my high school graduating class rather than just in the top half. My 18 means something different if I have taken solid courses rather than choosing an easy way to a diploma.

My 18 means something different if I were to take the test again. My 18 on the ACT means something different depending on the college or university I attend. Among Kentucky's public universities, for example, my 18 would be a relatively low score in one but a relatively high score in another.

My 18 is like Shaquille O'Neal's free-throw percentage. Being able to shoot barely 50 percent from the free-throw line says something about O'Neal. But, does it mean not "meeting expectations" or not being "ready" for the NBA? Hardly. NBA scouts are clever people who would not let one piece of information become a judgment.

ACT says that a person with an English score of 18 has a 50/50 chance of getting a B or higher in a college English course. We old guys say that is throwing the statistical dice.

My left foot is in boiling water and my right foot is in ice water but "on the average" I feel fine.

Of all students with a score of 18, one-half will get a B or better and one-half a C or worse. In which half is my 18? Is it in the ice or boiling water? ACT cannot answer that question. But if I were in the top 10 percent of my class, it is more likely that I will get the B or better.

We old guys also wonder why so much was made of ACT results and so little reported about Kentucky's Advanced Placement results. As opposed to the ACT, AP scores are a direct measure of whether a student can do college work.

Students take AP courses in a variety of subjects. An AP course is so well defined that what Kentucky students experience is comparable to what is experienced by students throughout the country. Each AP course is taught by a capable teacher and is described in detail with course goals, materials and examinations available for scrutiny.

The examination measures what is learned in the course. High-school students who score 3, 4 or 5 on an AP examination can get placement or credit or both in the college or university they attend.

Recent results show increased numbers of Kentucky's students taking AP courses and scoring higher on them. That is, more Kentucky students are doing acceptable college work in high school. It would make sense, therefore, for an educational community wanting more students to attend college to work to expand AP opportunities.

While it is important to provide opportunities to prepare each student to take AP courses, it is unlikely that all students will take them. Even so, AP courses point in a positive direction for schools. They are a model for structuring courses and tying together testing and instruction.

An algebra course, for example, in Fayette County should be the algebra course in Christian County. More important, two algebra courses in one Fayette County school should be the same algebra course.

AP results show, not surprisingly, that students tend to learn what they are taught. It is important, therefore, to define and teach what it is important to learn. Then one can assess and give credit.

Someone might ask the old guys what this has to do with O'Neal. The answer is clear. If you want to know what kind of basketball player he is, you let him play basketball. You don't look only at his free-throw shooting percentage and decide that he does not meet expectations.

6 comments:

Richard Innes said...

There is too much here to fully discuss, but here are some short observations.

I guess Kifer and his compatriots at Georgetown College (the “old guys?”) will never get the idea behind the ACT’s Benchmark Scores.

It really isn’t hard to understand the Benchmarks. One short, easy-to-read discussion is at http://www.act.org/research/policymakers/pdf/benchmarks.pdf. Readers of this blog owe it to themselves to check this out.

I am not sure where Kifer tries to go with his comments concerning Advanced Placement courses. AP courses are indeed excellent, but only a relatively small proportion of Kentucky’s students take these courses. Certainly, the AP tests are set at too high a level to be suitable for a statewide assessment. Perhaps that is why not as much is made of the results, which, after all, only cover our elite students. However, I agree that AP improvement is important, and worthy. But, as Kifer seems to ask, how many Kentucky kids are being adequately prepared in the lower grades to take AP courses, if they want to?

I do agree with Kifer that an algebra course in Fayette County should be equivalent to one taught elsewhere in the state. The reason that isn’t happening is that we have an almost unworkable system in Kentucky where standards and accountability tests are controlled at the state level, but local districts are left to their own to develop the curriculum. It’s a recipe for massive confusion – which Kifer seems to recognize.

Finally, I think Kifer’s basketball analogy is messed up. While you certainly can’t judge a player only by his free-throw shooting, a player with really lousy free-throw statistics probably won’t be a player for long. Those stats, just like some others that Kifer doesn’t want to accept, most definitely count.

Skip said...

Had Mr. Innes been in my testing class I would have asked him what part of the ACT benchmarking process I don’t understand. In that class I expect students to able to:

Describe the statistical models used to determine the ACT benchmarks;
Using the statistical methodology develop other benchmarks from a similar data base;
Display the appropriate distribution and respond coherently to questions like: what is the distribution of ACT scores for students who get an A in the English course;
Discuss the assumptions made with the ACT procedure with special emphasis on the hierarchical nature of the data;
Present studies to validate the ACT classification system and compare those to what ACT has done;
Be prepared to answer questions like: Why did ACT use its test alone in the prediction rather than including a student’s high school record which is the better predicator?

I wonder how Mr. Innes would do in regard to my expectations for students.

The Principal said...

Richrd Innes says,

"I guess Kifer and his compatriots at Georgetown College (the “old guys?”) will never get the idea behind the ACT’s Benchmark Scores."

I'm not sure that's the problem.

I think the problem may be that Kifer "gets it" very well.

The question in my mind is - Does Innes? BGI constantly promotes Innes as a widely recognized testing expert. Skip has tossed down the gauntlet.

So let's settle this right here.

Dick, the floor is yours.

Richard Innes said...

While I await some information from ACT, how about applying some of Professor Kifer’s questions to our very own CATS KCCT tests. Here goes:

(1) (Q) Describe the statistical models used to determine the CATS KCCT benchmarks.
(A) Who knows? Whatever they are, they certainly can’t be credible. When the percentage of students graded proficient for on-demand writing skyrockets in one year from 4.48 percent to 43.59 percent (elementary school results for 2006 and 2007), the models, whatever they are, are clearly broken.


(2) (Q) Using the statistical methodology develop other benchmarks from a similar data base;
(A) Can’t do it. There is no other similar data base to CATS. No other state uses anything close to our system with its highly troubled writing portfolios and other dubious features.

In fact the last other state using portfolios, Vermont, ceased doing so several years ago per a report in the Assessment and Accountability Task Force on Tuesday. The Principal attended that meeting.


(3) (Q) Display the appropriate distribution and respond coherently to questions like: what is the distribution of KCCT writing scores for students who get an A in the English course?
(A) The 2006-2007 Writing Portfolio Audit was discussed at some length in an earlier Assessment and Accountability Task Force hearing, including the distribution of portfolio scores before and after audit.

A total of 9933 portfolios were audited, which included students across all grades tested. Originally, 588, or 5.9 percent were graded “Distinguished.” After the audit, 176 of the 588 had scores lowered by two levels to “Apprentice.” Another 375 were rescored as only “Proficient.” Only 37, that’s right, just 37, portfolios retained their original “Distinguished” rating.

Things were nearly as bad for portfolios scored “Proficient.” After auditing, only 39 percent retained that grade or higher. About 61 percent were downgraded.

Now, I haven’t seen any data from the KDE on grade distributions, but with only 37 kids actually deserving the top score on KCCT portfolios, and with far more than half not even deserving the next highest score, it really doesn’t matter. Clearly, portfolio grade inflation is rampant and won’t correlate well to classroom grades.

In truth, both CATS scoring and student grading are problematic in this state, which is why Kentucky colleges know the ACT is still needed.


(4) (Q) Present studies to validate the CATS KCCT classification system;
(A) I mentioned one above for writing, and it’s clear that element of the CATS isn’t valid. I have also examined the growth of proficiency rates in CATS versus the NAEP for reading and math in fourth and eighth grade, and that data also indicates strong score inflation in CATS over time.


(Bonus Item) Be prepared to answer questions like: Why did Kentucky use its CATS test alone to judge schools rather than including a students’ high school records, which are the better predicator?

Great question. Since Professor Kifer has defended CATS (e.g. KY Senate Education Committee, March 6, 2008), ask him why. The ACT isn’t perfect, but nobody, and I mean nobody, cites it with anything like the obvious measurement errors in CATS such as those discussed above.

Skip said...

Oh, my!

Skip said...

Dear Mr. Innes:

I am not really interested in carrying out a long public correspondence. You challenged my knowledge; I challenged you to prove it. You flubbed it.

It would take a very long letter to clear up things in your retort. I will just take one instance, the first. You said:

1) (Q) Describe the statistical models used to determine the CATS KCCT benchmarks.
(A) Who knows? Whatever they are, they certainly can’t be credible. When the percentage of students graded proficient for on-demand writing skyrockets in one year from 4.48 percent to 43.59 percent (elementary school results for 2006 and 2007), the models, whatever they are, are clearly broken.

I know. And I know setting standards is different from scoring student work. You conflated those two things in your answer.

CATS does not have benchmarks. It has proficiency standards. The models used to set the standards are judgmental ones, not statistical ones. Some of those models have been around for almost 40 years. Once the judgments are made (typically by the best teachers in the grade or field) there are some statistical manipulations to arrive at the cut-points on the proficiency distribution.

The ACT benchmarks are statistical. At a minimum a model is fitted to determine how well ACT scores predict grades. One way to interpret the results is to figure out, given an ACT score, what is the probability of getting a particular grade.

Skip Kifer

P.S. I tried to find your email address so this communication would be private. But, I could not find the address