Wednesday, February 11, 2009

Let’s Hear it For Clear Standards

By Skip Kifer:

About 20 years ago the National Council of Teachers of Mathematics (NCTM) produced its first set of standards. About 20 days ago Kentucky’s Republican Senators discovered NCTM’s latest set.

During the interim NCTM standards have informed Kentucky’s and most other state’s assessments, President Clinton’s Voluntary National Test, and the National Assessment of Educational Progress (NAEP). The standards were used both as statements of instructional content and, when refined, as frameworks for assessments.

The standards emerged after the Second International Mathematics Study (SIMS) identified an “underachieving” U.S. mathematics curriculum, declaring instruction to be a “series of one night stands” in a curriculum that instead of “spiraling was a set of concentric circles.” Rather than take on the political task of establishing a national curriculum, NCTM promulgated content standards.

Talk about standards then was understood to be talk about content standards – what should be taught, when. Not now. Increased emphasis on assessment in the country brings into play other sets of standards. As the Kentucky legislature seeks to change the Commonwealth’s assessment, a robust standards discussion means distinguishing one set from another and knowing when each applies.

Clarity is needed.

Here is an example of an NCTM content standard:

Data Analysis and Probability Standard Instructional programs from prekindergarten through grade 12 should enable all students to—

Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them

Grades 6–8 Expectations In grades 6-8 all student should:

formulate questions, design studies, and collect data about a characteristic shared by two populations or different characteristics within one population;
select, create, and use appropriate graphical representations of data, including histograms, box plots, and scatterplots.

The Lexington Herald-Leader reported a rationale for adopting the new NCTM standards: “we need higher standards,” and we need “more precise and rigorous standards.” It is unclear how restating a content standard would meet such demands. Those of us who have worked with content standards know the issues are not whether they are high or low. Content standards are tricky but on other dimensions: if too broad, they are hard to operationalize; too narrow, and they miss stuff; too many, and they confuse things; too few, we don’t know about since that’s never been done.

Those who call for “higher standards” are probably talking about different standards -proficiency standards. Proficiency standards are based on test scores. Below is a picture of the distribution of KIRIS reading scores with the proficient cut-point for 4th grade Kentucky students.

About 1/3 of the students scored at or above the cut-point of 311 and would be labeled proficient. Clearly one can ask whether this standard is too high or too low. Should one move the cut-point up or down? Interestingly, six years earlier just 10 percent of the students exceeded this cut-point. What might look low one time might earlier looked like high.

There are at least two other sets of standards worth discussing – Opportunity to Learn (OTL) standards and standards for educational and psychological testing. The latter are published by the American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education (AERA/APA/NCTM).

OTL standards reflect whether or not a student has had a chance to learn the material in the test. They are fairness standards. International tests of 8th grade mathematics, for example, typically contain heavy doses of algebra and geometry. Since most of our students are not exposed to substantial amounts of algebra or geometry by the 8th grade, while other systems’ students are, we are at a disadvantage. We lack the opportunity to learn. Hence, we wonder about the validity of international comparisons.

OTL comes into play, too, when tests are used for purposes such as graduation or grade promotion. Have the students had the opportunity to learn the material covered by the test? If not, then the test is not valid for the purpose it was intended.

Using a single test score for important educational decisions is, in any case, considered bad testing practice. That dictate and other proscriptions are in the AERA/APA/NCTM standards volume. Among others, there are standards for properly constructing tests, reasonably interpreting test scores, and providing information about a test and its proper uses. Any major testing endeavor is expected to comply with those standards.

So, what does all of this have to do with Kentucky’s Senate Republicans? Well, as they pursue dramatic changes in assessment practices in the Commonwealth, they may not be as informed as they might be. As a first step to better communication, I welcome them to the standards club and to some clarity!


Susan Weston said...

SJR 19's supporters, including Senator Kelly and Superintendent Brown, think our Core Content standards are too broad and too many. They hope to "operationalize" the changed standards with testing that reports each student's status on each standard, and that will only be possible if we have a shorter, sharper list of what's expected in each grade.

I'm largely supportive of their approach, but your caution is important. Under SJR 19, we risk narrowing too much. The math skills that can easily be gauged by multiple choice are not the only the math skills students need. Let's be careful not to drop crucial abilities to tackle problems and explain solutions.

Skip Kifer said...

I am not sure what would be done with results from tests of, say, five standards. If they are used to make decisions about a student, they need to have the psychometric properties that allow such interpretations. That is, each test must be as "good" as what is being done with one test in CATS. That means a lot of testing.

I remember the confusion with KEST scores. Mastery or non-mastery was assigned on subtests with 5 or 6 items. Sometimes 5 of 6 was mastery; sometimes it took 6 of 6; sometimes 6 of 6 was not enough. We worked with a group of teachers to see whether they could understand the score reports. Of course they could not since the decisions were made on the basis of IRT models not the number correct.

I think it may be time to give our legislators a test. First question: True of False CATS mathematics tests were based on the NCTM standards. I have more in mind ....