Showing posts with label Ben Oldham. Show all posts
Showing posts with label Ben Oldham. Show all posts

Friday, February 27, 2009

House plan to reform CATS unveiled

This from the Herald-Leader:
FRANKFORT — A House bill to revamp Kentucky's controversial CATS student testing system got its first public airing late Thursday before the House Education Committee.

House Bill 508's many provisions include: replacing the CATS system, but not until after the 2010-2011 school year; revising all state academic standards in a phased process starting next year; and aligning core content at all levels, as well as aligning high school academic core content with college requirements.

While the current testing program would continue through 2011, writing portfolios would be removed from accountability this year. Portfolios would be retained, however, as instructional tools from primary through 12th grade.

The measure is the House's response to Senate Bill 1, which would rework CATS by eliminating open-response questions and taking out portfolios. The Senate has already passed SB 1.

Most of the Education Committee session was devoted to a detailed explanation of HB 508 by its lead sponsor, state Rep. Harry Moberly, D-Richmond.

Moberly said he would have preferred to leave CATS unchanged through 2014, when all Kentucky students are supposed to achieve proficiency status. But he said there had been "such a hue and cry" over CATS that he thought change was necessary...

Over at the Prichard blog, Susan has a workup of the bill and opines that HB 508 is a strong step forward. Notice Ben Oldham's argument for a blended assessment in the comments section.

I maintain an almost complete lack of faith in program reviews as an effective solution to the Writing Portfolio + Arts & Humanities problem. I seriously doubt the muscularity of the approach and predict that, if adopted, within five years program reviews will be abandoned as useless.

I hope the House will find some middle ground during negotiations.

Sunday, October 19, 2008

Innes smacks Kifer. Kifer smacks back.

We recently had a little dust up over testing between education analyst Richard Innes and Georgetown/UK Professor Skip Kifer.

The Bluegrass Institute's Richard Innes, argues that credibility and stability problems with the CATS are reason enough to dump it. In this argument, the GREAT is the enemy of the GOOD. Innes pounds on the CATS' several flaws; hoping that by seeking perfection the utility of the instrument will be doomed. He thinks there's a better idea. Just throw the CATS out. Maybe, replace it with the ACT. After all, the ACT is now a super test with super powers. Just ask ...the folks at ACT.

Ben Oldham recently wrote, "Since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools."

Innes, assisted that over-interpretation mightily, and decided he would "school" Professor Oldham saying,
Oldham pushes out-of-date thinking that the ACT is only a norm-referenced test. The ACT did start out more or less that way, years ago, but the addition of the benchmark scores, which are empirically developed from actual college student performance to indicate a good probability of college success, provides a criterion-referenced element today, as well.

Well, I'm no testing expert but even I knew that was wrong. Wrong enough, that I began to worry that BGI's testing expert might have some holes in his own preparation.

KSN&C responded,

The problem of over-interpretation has been somewhat exacerbated by the inclusion of benchmark scores in the ACT. But benchmarking does not change the construction of the test nor the norming procedures. It does not turn the ACT into a criterion-referenced exam as Innes tries to suggest...

The National Association for College Admission Counseling Commission — led by William Fitzsimmons, dean of admission and financial aid at Harvard University — issued a report last month the sparked a lot of conversation about how tests like the ACT and SAT were being misused.

This is a recurring theme in the college admissions business but this year's report warned that the present discussion of standardized testing has come to be “dominated by the media, commercial interests, and organizations outside of the college admission office.” Some of those groups have other items on their agenda.

Skip Kifer mentioned the report's warning about over-emphasizing test scores and argued for prudent use of the ACT in selecting students. He also warned folks not to get snookered by ACT officials new claims that without having changed the nature of the test, their scores now tell whether a student "meets expectations" or is "ready" to attend college.

As Kifer pointed out,

The benchmark stuff is statistically indefensible. Hierarchical Linear Modeling(HLM) was invented because people kept confusing at what level to model things and how questions were different at different levels. The fundamental statistical flaw in the benchmark ... is that it ignores institutions. Students are part of institutions and should be modeled that way.

Innes persisted and tried to play it off saying,

I guess Kifer and his compatriots at Georgetown College ... will never get the idea behind the ACT’s Benchmark Scores. It really isn’t hard to understand the Benchmarks.

This is when I should have suspected an academic spittin' contest was on the way and somebody was going to have to put up or shut up. (Actually, in the blogosphere, nobody really shuts up, but you know what I mean.)

Stung by the suggestion, and always the professor, Kifer challenged Innes to "Describe the statistical models used to determine the ACT benchmarks." This question would show whether Innes understood the nature of Kifer's argument and at the same time would disprove his claims.

Innes said he was waiting for some information from ACT (which in my experience is a lot like waiting for Godot) and changed the subject to how lousy the CATS is.

We're still waiting to hear him defend his position that the ACT Benchmarks constitute a valid criterion-reference test and that "the Benchmarks can fairly be considered a real measure of proficiency." That may be like waiting for Godot as well.

KSN&C has taken the position that all social science tests are imperfect. This includes the CATS, the KIRIS, the ACT, the SAT, the CTBS, the NAEP, and the EIEIO.

Monday, September 29, 2008

What the Bluegrass Institute Doesn't Seem to Know about the ACT

There is no way to sugar coat this.
Somebody doesn't know what he's talking about
- and it's not Oldham.

Well, Ben Oldham and I got called out by Richard Innes of the Bluegrass Institute the other day for "ignoring" the ACT benchmarks scores - apparently the holy grail of assessment in his mind.

Of course, this is all part of a larger conversation about Senate Bill 1 and the on-going Task Force on Assessment at KDE.

I wasn't planning on getting into all of this this fall. It's tedious, inside baseball kind of stuff. But the fundamentals are still the same. First, it's just a test. Second, every test has been designed to do a specific job. If test designers wanted a test to do something else, they would begin with that fact in sight. Third, have I mentioned it's just a test?

OK, let's talk about the ACT.

Here's the problem with the American College Test: Nothing, really.

It is a well-designed test intended to help admissions officers at competitive colleges determine which students are most likely to be successful at the university level. Ben Oldham recently went further saying, the "American College Test (ACT) is a highly regarded test developed by a cadre of some of the best measurement professionals in the world and is used by a number of colleges..."

But the ACT is only ONE factor that colleges use to make such determinations.

Why?

I mean, if the ACT can predict success in life, as Innes un-credibly argues (below), why don't colleges simply rely on it and quit wasting time compiling grade point averages and other data they say they need to made the best choices for their school?

The answer lies in the fact that test data are only reliable up to a point. It's just a test score and it shouldn't be turned into anything more.

As Richard C. Atkinson and Saul Geiser recently pointed out,

the problem with general-reasoning tests like the SAT [and ACT] is their premise: that something as complex as intellectual promise can be captured in a single test and reflected in a single score. It is tempting for admissions officers--and parents, legislators, policymakers and the media--to read more into SAT [and ACT] scores than the numbers can bear. Although measurement experts know that tests are only intended as approximations, the fact that scores frequently come with fancy charts and tables can create an exaggerated sense of precision.

And such exaggerations persist.

Newspapers and bloggers rank scores that ought not be ranked - because people like rankings. Some "think tanks" act as though test scores equal "truth" and look for any opportunity to twist data into a pre-existing narrative that Kentucky schools are going to hell in a handcart - this, despite trend data to the contrary, about which they are in full denial.

Georgetown College Distinguished Service Professor Ben Oldham correctly warned that, "since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools. His excellent article clarifies what the test is, and what it isn't.

The problem of over-interpretation has been somewhat exacerbated by the inclusion of benchmark scores in the ACT. But benchmarking does not change the construction of the test nor the norming procedures. It does not turn the ACT into a criterion-referenced exam as Innes tries to suggest - unless all one means by "criterion" is that the ACT derived a cut score. Under that definition a People Magazine Celebrity Quiz could be considered criterion. Socre 18 and you're a Hollywood Insider!

The ACT's "criteria" simply does not measure how well Kentucky students are accessing the curriculum. It is much more sensitive to socio-economic factors attributable to most of the college-going population.

Using a "convenience sample" of schools (those willing to participate) the ACT looks at student success in particular college courses; and then looks at the ACT scores obtained by "successful" students. But regardless of what data such a design produces, "there is no guarantee that it is representative of all colleges in the U.S." Further the ACT "weighted the sample so that it would be representative of a wider variety of schools in terms of their selectivity."

That is to say, they tried to statistically adjust the data produced by the sample to account for more highly selective schools as well as the less selective. This process of weighting data to produce a score that the original sample did not produce should be viewed suspiciously. It would be like...oh, let's say like....using a concordance table to give students a score on a test they didn't take.

If KDE had done anything like this, Innes' buddies at BGI would be crying "fraud."

If we are going to test, and if our tests are going to be used to determine placement in programs within schools, and eventually in college, then we need to understand what the ACT means when it says "college-ready." And we don't. The most important flaw of the ACT benchmarks is conceptual: What is "readiness" for higher education?

As one delves deeped into the statistics other problems arise. Skip Kifer who serves on the Design and Analysis Committee for NAEP told KSN&C,

The benchmark stuff is statistically indefensible. Hierarchical Linear Modeling
(HLM) was invented because people kept confusing at what level to model things
and how questions were different at different levels. The fundamental
statistical flaw in the benchmark ... is that it ignores institutions. Students
are part of institutions and should be modeled that way.

But the ACT models at the "student" level when it should be modeling at the "students nested within institutions" level.

It is possible that the ACT took a kind of average of those "correct" models but that can not be determined that from their Technical Report.

Perhaps Innes could help us understand: How is it that the ACT's benchmarks could have been empirically defined and yet managed to get the same relationship for the University of Kentucky and Lindsey Wilson College?

Unfortunatley, the ACT folks did not respond an inquiry from KSN&C.

But none of this will likely stop the exaggeration of the ACT's abilities.

In response to a KSN&C posting of Ben Oldham's article, Innes made the following claim:

Oldham pushes out-of-date thinking that the ACT is only a norm-referenced test. The ACT did start out more or less that way, years ago, but the addition of the benchmark scores, which are empirically developed from actual college student performance to indicate a good probability of college success, provides a criterion-referenced element today, as well.

"Criterion-referenced element?!" A cut score? The ACT is a timed test too - but that doesn't make it a stopwatch.

So, Oldham is old fashioned and out-of-date? Au contraire. It is Innes who is over-reaching.

Innes argues,

the ACT says that many employers for ... better paying jobs now want exactly the same skills that are needed to succeed in college. So, the Benchmark scores are more like measures of what is needed for a decent adult life. Thus, it isn’t out of line to say that the Benchmarks can fairly be considered a real measure of proficiency. And, that opens the door to compare the percentages of students reaching EXPLORE and PLAN benchmarks to the percentages that CATS says are Proficient or more.

Bull.

One could derive as much "proficiency" evaluating Daddy's IRS form 1040 and then comparing percentages of students reaching EXPLORE and PLAN benchmarks to the likelihood of owning a BMW or affording cosmetic surgery.

I'm afraid what we have here is something other than a nationally recognized assessment expert who is out-of-date.

We have a pundit who thinks the ACT benchmarks constitute a criterion-referenced assessment of the performance of Kentucky students and their prospects for a decent adult life!? This, absent any connection between the ACT and Kentucky's curriculum beyond pure happenstance. There is no relationship between a student's ACT score and any specified subject matter - which is typically the definition of a criterion-referenced test.

There is no way to sugar coat this. Somebody doesn't know what he's talking about - and it's not Oldham.

The best spin I can put on this is that Innes got snookered by ACT's marketing department, which seems to do a fine job, but has been known to overstate the abilities of ACT's EPAS system.

But none of this makes the ACT a bad test. It just means that assessment experts have to take care to understand the nature of the exams and not to rely on them to do too much.

And it is commendable that Kentucky is working toward building an actual relationship between Kentucky's curriculum and that of the ACT through the development of content tests. That work will get Innes closer to to where he wants to be. He should wait for the actual work to be done before making claims.

Just as Atkison, Geiser, Oldham, Kifer, Sexton and virtually everybody else says, the results should not be over-interpreted to suggest relationships that just aren't there. And trying to argue causal chains that are completely unproven is certainly not best practice.

But more to the point, Kentucky recently committed to use the ACT's EPAS system including EXPLORE and PLAN as yet another measure - a norm-reference measure - of student performance. As long as Kentucky is cognizant of the test's limitations we ought to strengthen the connections between Kentucky high schools and universities and gauge student readiness for college. It was because of the large numbers of college freshmen in need of developmental courses that CPE pushed for the ACT/EPAS system to begin with.

Kifer wonders why Kentucky's Advanced Placement (AP) Tests receive so little attention. After all, unlike the ACT, the AP tests are a direct measure of a high school student's ability to do college work; AP courses are particularly well-defined; the tests exist across the curriculum; good AP teachers abound; course goals and exams are open to scrutiny.

When a high schooler passes an AP test he or she not only knows what it means, but the school of their choice gives them college credit for their effort.

Aware of CPE's commitment to the ACT as one measure of student readiness, KSN&C contacted newly named Senior Vice President of the Lumina Foundation Jim Applegate, who until recently served as CPE's VP for Academic Affairs.

Here's what Jim had to say:

Richard,

The article recently referenced in your publication from the admissions officer group addresses the use of ACT for college admissions. The
organizations sponsoring assessments such as ACT, SAT, and others have made clear that no single standardized test should be used to make such decisions. Postsecondary institutions, to implement best practice, should use a
multi-dimension assessment to make admissions decisions. A test score may play a
role in these decisions, but not the only role.

Kentucky uses the ACT/EPAS system (the Explore, Plan, and ACT tied to ACT ‘s College Readiness Standards) to help determine college readiness, place students in the right high school courses to prepare them for college, and place them in the right courses once they go to college. Kentucky’s revised college readiness standards are
about placement, not admission. For the foreseeable future, the postsecondary
system will, as it has always done, accept large numbers of students with ACT
scores below readiness standards, but will provide developmental educational
services to these students to get them ready for college-level work. The large
number of underprepared students coming into Kentucky’s postsecondary system led the Council a couple of years ago to initiate an effort to improve developmental
education order to make sure these students receive the help they need to
succeed in college.

A growing number of states are adopting the ACT or the entire EPAS system to more effectively address the challenge of getting more high school graduates ready for college or the skilled workplace (e.g., Colorado, Illinois, and Michigan). These states also want to better understand the performance of their students in a national and international context. Globalization no longer allows any state’s educational
system to remain isolated from these contexts.

The use of ACT/EPAS is, of course, only one necessary strategy to improve the college/workplace readiness of Kentucky’s traditional and adult learners. Kentucky is working to implement statewide placement tests in mathematics, reading, and English that will be administered to high school students who fall below statewide college readiness benchmarks tied to ACT scores (few states have gotten this far in
clarifying standards to this level). These placement tests will provide more
finely grained information about what students need to know to be ready for
college-level work. We are also working to more strongly integrate college
readiness goals into our teacher preparation and professional development
programs to ensure teachers know how to use the assessments beginning in middle
school to bring students to readiness standards.

The postsecondary system is hopeful the implementation of ACT/EPAS will promote partnerships between postsecondary and high/middle schools to improve student achievement. Some of that has already begun since the first administration of the EPAS college readiness system. For the first time in my time in Kentucky (I grew up
here and returned to work here in 1977) we now know where every 8th grader is on
the road to college readiness thanks to the administration of the Explore. If in
five years the number of students needing developmental education is not
significantly less than it is today then shame on all of us.

Jim Applegate

All of this reminds me of the old Crest Toothpaste disclaimer I read daily while brushing my teeth over the decades.

Crest has been shown to be an effective decay preventive dentifrice that can be of significant value when used as directed in a conscientiously applied program of oral hygiene and regular professional care.

Let's see if I can paraphrase:

The ACT/EPAS system has been shown to be an effective norm-reference assessment that can be of significant value when used as directed in a conscientiously applied assessment program based on clear curriculum goals, direct assessments of specific curriculum attainment and effective instruction from a caring professional.

Thursday, September 25, 2008

For Those Who Would Better Understand the ACT

KSN&C, and I suspect a lot of folks, got a note from Prichard Committee honcho Bob Sexton today regarding the ACT test results recently kicked around in the press.

The media coverage, and occasionally comments from school officials, badly confused what the ACT is and is not and how scores should be used and should not be used.

To bring some scholarly understanding to this misinformation about ACT, Ben Oldham, Distinguished Service Professor at Georgetown College, has written the attached statement.

Here's Ben's article:
September 24, 2008


Adding Understanding to the ACT scores


Ben R. Oldham
Distinguished Service Professor
Georgetown College


The recent release of statewide ACT scores has created a lot of discussion about the quality of Kentucky schools. It has been reported that approximately 43,000 Kentucky juniors earned an average of an 18.3 composite score out of 36 on the ACT. Kentucky is one of just five states that requires the ACT for all high school juniors. It should be noted that the American College Test (ACT) is a highly regarded test developed by a cadre of some of the best measurement professionals in the world and is used by a number of colleges as one selection-for-admission measure.

Extensive research has been conducted that suggests that the ACT is a significant predictor of freshman college grades. The ACT is designed to predict college success. My research suggests that high school grade point average is a similar predictor of college success. Since the ACT is administered to all Kentucky juniors, there is a tendency to over-interpret the results as a measure of the success of Kentucky schools. The successes of Kentucky education reform are inevitably brought into question.


Other evidence tells a different story. More students in Kentucky are taking AP exams and more students are earning college credit through the Advanced Placement program than ever before. These are standards-based exams. The purpose of these tests is to determine, by following a tightly structured curriculum, if students earn high enough scores to earn college credit while still in high school. Teachers know precisely what should be taught and through their excellent instruction more high school seniors earned college credit.


The Kentucky Core Content Test (KCCT) is administered throughout the grades of Kentucky’s public schools. Like AP tests, the KCCT assessments are standards-based exams. Teachers in Kentucky’s schools teach from a core content that defines what Kentucky students should know and be able to do as they progress through school. Like the AP test, the KCCT is designed to precisely measure how well the students have mastered the defined curriculum. The categories of novice, apprentice and distinguished are used to define the achievement of students. Its purpose is to monitor the growth of schools toward a Commonwealth goal of the average student achieving at the proficient level by 2014.


It is desirable for large numbers of students to achieve at the highest levels of achievement on both the AP and KCCT tests. Having small and reducing numbers of students at the lowest levels of achievement is also desirable. Here is where the difference with the ACT and standards-based tests lies. The ACT is a norm-based test. It is not designed to determine what students know and are able to do like the AP test or KCCT test. A norm-based test compares a student’s performance on a bank of test questions with students in a comparison group; in this case a national but not nationally representative comparison group since its purpose is for the college-bound. By design, the ACT spreads student scores to assist colleges and universities in making admission and scholarship decisions. When the ACT was developed, the average score was set at 20 regardless of the academic achievement of those in the norm group. If it were the case that everyone in the national comparison group scored at a high level, the mean score would be 20. If nearly all scored at a low level on the ACT, the mean would be 20. The purpose of the ACT is to assist colleges and
universities in making admission decisions. By design it separates students into a range of scores from the 1st percentile to the 99th percentile regardless of the pure academic achievement.


Because it is administered to students across the country, the ACT is designed to be insensitive to curriculum to not give an advantage to any particular curriculum. This is another major difference. Both the AP and the KCCT are built around a tightly defined curriculum. Because the ACT is insensitive to school curricula many employ the test-taking strategies to artificially inflate test scores. It should not be quick and easy process to improve test scores because it does not reflect true improvement. However, given a defined curriculum, public school teachers have done and will continue to do an exemplary job educating students toward a common goal. Are there improvement strategies that can be employed? Absolutely, but the ACT does not contribute to these strategies because the ACT must, by design, separate students to assist colleges in selection decisions.


Teachers, parents, principals, superintendents, board of education members, and most importantly students must not overlook this purpose. While it is important, schools should not be evaluated using a tool that is insensitive to the core content and is designed to differentiate between the higher-achieving college-bound.


If the purpose of Kentucky public schools is to prepare all elementary and secondary school students for college, then the core content followed by schools needs to be adjusted with significant input from college professors to include things like the thoughtful analysis of data and ideas, explaining and demonstrating math solutions and a solid foundation in the college general education curriculum. Regardless, the evaluation of the achievement of the core content must be measured by a test that determines the success in achieving that curriculum rather than a norm-based instrument, like the ACT, that merely compares a student’s performance with college-bound students nation-wide.

Monday, March 03, 2008

Hot off the presses: CASA cites Problems with Senate Bill 1

As KSN&C readers may know that when it comes to assessment, I play favorites.

Over the years, the best, most reliable, and independent sources of information regarding assessment in Kentucky have consistently come from three individuals, Skip Kifer, Ben Oldham and Tom Guskey. I am fortunate to have studied under Ben and Skip at UK. Furthermore, as a principal, I would share my data with Skip annually and he would selflessly sit with me and analyze gap closing and the overall performance of Cassidy students. Tom has built a fine national reputation for scholarly excellence.
If we ever disagree - they're probably right.

If I had a magic wand - they'd redesign and oversee the state's assessment program rather than KDE, but that's another story.

Recently - owing largely to Skip and Tom's disillusionment with some things at UK - they became ripe for recruitment by Vice President Ben Oldham at Georgetown College. Their formation of the Center for the Advanced Study of Assessment is a major coup for the small liberal arts college.

Today they weigh in on Senate Bill 1 in a 19-page analysis posted at the Prichard Committee.

Read the whole thing. But here is their conclusion:

Enacting Senate Bill 1 will change dramatically Kentucky’s assessment and accountability systems. There are four major departures from the existing assessment that we find problematic and should be carefully scrutinized. Careful scrutiny includes identifying changes that have broader implications than might at first be evident.

1. The first major change is moving the focus away from school outcomes to individual academic achievement.

This could substantially reduce the amount of information that is available to schools and districts, information they have used in the past to judge in a particular content area what part of the core content they have done well and what part not so well. The reason for the reduction is that previous assessments produced more than one form for the assessment thereby increasing the breadth and depth of what was sampled. This change also may not have its intended results of better measurements at the individual level. Because the test is expected to be an adequate sample of Kentucky’s core content, produce national norm-referenced results, and provide diagnostic information, the attempt to do all three things may limit how well it does any one of them.

2. The second major change is moving parts of the assessment from the state level accountability portion to the district level where there is no formal accountability.

Moving things to the district level and making the district responsible for parts of the assessment is not necessarily a bad thing to do. Other states have done similar things. But it may not be wise to move just some of the assessment to a new level. An assessment that is part state accountability and part district responsibility may prove to weaken both. We are particularly concerned about the effects of such changes on Kentucky’s long standing commitment to teaching students to write well.

3. The third major change is assessing with only multiple choice items.

Open-ended responses, the writing portfolio, and on-demand writing have been eliminated. Given the nature of the goals, standards, and expectations for Kentucky schools, we do not believe that an assessment that relies exclusively on multiple-choice items can adequately describe those outcomes. Measurements that tap more easily more complex skills and knowledge are both necessary and desirable.

4. The fourth, and perhaps most important, major change is reducing and perhaps eliminating the participation of teachers in the formal assessment.

In the past teachers have worked on content standards, created test questions, evaluated portfolios, and graded open-response items. Although a cadre of teachers became experts in these areas, creating technical expertise was not the main purpose of the involvement. The purpose was to provide opportunities for teachers to see how instruction, standards, and assessmentsshould be intimately tied together. These activities are powerful ways to make teachers both better teachers and better assessors.

Friday, September 28, 2007

The new NAEP data are released. That means it's time to spin the news!

The release this week of national test scores in reading and math has generated a fresh round of conversation about how Kentucky's students are performing.

We have this conversation at least three times a year when the various "yardsticks" (NAEP; CATS; NCLB) trot out their measurement data. Then there's SAT, ACT... EIEIO....

Wouldn't it be great if such data were integrated into a comprehensive value-added system? But I digress.

NAEP scores nationally, and in many individual states, showed modest gains from 2005 to 2007.

As Diane Ravitch explains in today's New York Post,
The federally sponsored National Assessment of Educational Progress (NAEP) is known in the education world as the gold standard of testing. In 2002, Congress authorized NAEP testing in every state to serve as a check of the states' own claims about their progress. (Congress rightly worried that individual states would dumb-down tests that they themselves develop and administer.)
And, there is at least reason to be suspicious of Kentucky's "new and improved" test. It appears Kentucky may have joined a number of other states in a race to the bottom by the redefining of proficiency.

Whenever test score data are released the spinning begins. The Kentucky Department of Education has an interest (some might say a duty) in pointing out the progress made by the schools. So they publicly shine a light on the best numbers, and privately express concern for the worst.

It's a little thing called spin. Everybody seems to delight in the practice these days.

In a Tuesday press release, the Kentucky Department of Education said,

"The results of the 2007 National Assessment of Educational Progress (NAEP) in reading and mathematics show that Kentucky's 4th and 8th-graders made gains when compared to the state's performance in previous NAEP assessments..."

True. Gains were made. Kentucky's student achievement, as measured by the NAEP, has trended steadily upward overall. (See charts below.)

So that's KDE's headline; Progress over time.

On the other side of the argument, assessment watchdogs are sniffing out specific areas of concern. Writing for the Bluegrass Institute this week, Richard Innes took issue with KDE's discounting of declines in 8th grade reading.

KDE claims of eighth grade reading since 1998, “Kentucky’s 8th-graders’ scores have remained steady, with minor gains and losses.”

Is that a fair description? Let’s examine the facts.

In the new ... NAEP assessments ... Kentucky had a reading proficiency rate of 30 percent in 1998. That rose to 32 percent in 2002 and went up again to 34 percent in 2003.

Then, things came unglued.

Eighth grade reading proficiency decayed to 31 percent in 2005, and in 2007 it slid again to just 28 percent. The 2007 proficiency rate is statistically significantly lower than both the 2002 and 2003 scores and is clearly six points lower than the 2003 performance. That six point difference isn’t just statistically significant – it’s just plain SIGNIFICANT.

No other state lost more ground in this time frame.

What’s more, during the same time period, the CATS Kentucky Core Content Test reading proficiency rates for eighth graders continuously rose. Do you believe that?
CATS up 10 points while NAEP declined six?

I'll take Dick's word for it that the declines are statistically significant and that Kentucky's decline is the nation's worst. The decline certainly looks significant to me.

We don't generally consider six point drops minor, and more to the point, downward trends are antithetical to progress. If it were me...I think I'd have left the word "minor" out of KDE's statement.


A BETTER ATTACK?

Innes has discovered some bad news, but arguably, not the worst news.

While Kentucky has progressed steadily, so have other states. Growth is a vital factor to consider, but so is excellence. Kentucky's relative standing among the states frequently leaves the state in all too familiar territory.

For example, who do Kentucky students outperform in 4th grade math? (See map below)

New Mexico, Louisiania, Mississippi and Alabama. All other states are roughly equal to (9 states), or exceed (36 states) Kentucky's progress. You're not going to hear that in a KDE headline.

It's a little better at 8th grade. Add California, Nevada, Oklahoma, Tenessee, Hawaii and West Virginia to the list.

Kentucky only outscores eight states in 8th grade reading.

But clearly the best news for Kentucky is in 4th grade reading where Kentucky joins the national leaders and is only outscored by seven states. What happens between 4th grade and 8th grade in reading ought to be of concern.

MARK YOUR CALENDARS

We're less than a week away from KDE's next big announcement of progress. I predict the new CATS assessment will show average performance gains of 7% or so across-the-board and in some places, jumps will be huge based at least partly upon changes...
a) to the test itself
b) to the "cut scores" used to define proficiency

The new test data can not be compared to the previous tests - but it will be. It's the data school folks have.

We discussed the NCLB data situation last night at UK. Without advance comment, I asked a group of graduate students (and future principals) to analyze the NCLB proficiency rates in Kentucky. The general reaction to the sharp increase was "Wow!" One of the students shared her experience working with the assessment company to establish the new cut scores. We discussed changes to the system that might account for the dramatic increases, and how school leaders could "present" the data. That's when one of the students came up with the best spin ever. (Pay attention Lisa. Here's your angle.) The new assessment is a truer reflection of the content actually taught in Kentucky's schools, and therefore the 7-point spike in proficiency levels is a fairer measure of the actual progress Kentucky students have made than under the old test.

Terrific.

Now, if we can only get the NAEP data to bear that out....

We have a fundamental problem in our current accountability system. It's initial purpose was political (to garner the support of the business community for KERA's big price tag). It not focused so much on student achievement and curriculum. The focus was school accountability.

Better, would have been a assessment system that began with content and then folded the data into a value-added system, such as the one used in Tennessee. If CATS had been designed to improve instruction for individual students, it would have looked very different.

To their credit, and after the fact, many educators began to look at interim assessment systems that would help teachers identify learning problems early and intervene quickly. There has been a lot of good work done in the trenches, but the state system has become a hodgepoge under NCLB.

Interpreting test data to the public is a national problem, and "interested" parties will always spin the data to suit their own purposes. What we really need is a "disinterested" assessment/accountability reporting source.

As Ravitch understands, we need...
an independent, nonpartisan, professional audit agency to administer tests and report results to the public.

Such an agency should be staffed by testing professionals without a vested interest in whether the scores go up or down. Right now, when scores go down, the public is told that the test was harder this year - but when scores rise, state officials never speculate that the test might have been easier. Instead, they high-five one another and congratulate the state Board...for their wise policies and programs.

What the public needs are the facts. No spin, no creative explanations, no cherry-picking of data for nuggets of good news.
Just the facts.
I may even know the right folks for the job. I understand Ben Oldham at Georgetown College has recruited Skip Kifer and Tom Guskey to form an assessment center at Gtown. I studied under Ben and Skip and am familiar (along with most of the national academic community) with Tom's work. I can't think of a more valuable state resource to guide a more effective and fair assessment program that the guys at Georgetown.


KENTUCKY SCHOOL DATA from the 2007 NAEP Exam

Student Characteristics
Number enrolled: 679,878
Percent in Title I schools: 60.6%
With Individualized Education Programs (IEP): 16.0%
Percent in limited-English proficiency programs: 1.5%
Percent eligible for free/reduced lunch: 52.4%

School/District Characteristics
Number of school districts: 176
Number of schools: 1,426
Number of charter schools: N/A
Per-pupil expenditures: $7,2541
Pupil/teacher ratio: 16.0
Number of FTE teachers: 42,413

Racial/Ethnic Background
White: 86.3%1
Black: 10.6%1
Hispanic: 2.1%1
Asian/Pacific Islander: 0.9%
American Indian/Alaskan Native: 0.2%


Scale Scores for Mathematics
Kentucky vs. National Public




For more specific data check out NAEP's Data Explorer.


Cross State Comparison
Percent At or Above Proficient
4th Grade Mathematics
Where:
Blue = Kentucky
Green = States above Ky
Yellow = States about the same as Ky
Red = States below Ky



For more specific data check out NAEP's Data Explorer.