Wednesday, February 11, 2009

The Mythological Methodological Silver Bullet

"The U.S. curriculum is not only a mile wide and an inch deep, it also is very redundant. Not much new material is introduced and what is introduced is but to a small percentage of students because of tracking, an issue few wish to discuss."

By Skip Kifer:

FIMS, SIMS, and TIMSS

The international arena in general and the results of surveys from the International Association for the Evaluation of Educational Achievement (IEA) in particular are fertile grounds for those who wish to cite successful practices that deviate from existing ones:

If custom and law define what is educationally allowable within a nation, the educational systems beyond one’s national boundaries suggest what is educationally possible (Foshay et. al., 1962).

At one time, French schools closed on Wednesday afternoon. New Zealand children began school on their sixth birthday, not on the first day of school that year. Long ago the Swedes were literate before they had schools. They still introduce each student to a second foreign language. In Norway “the ultimate aim of education is to inspire individuals to realize their potential in ways that serve the common good.” School children in the United States recite a Pledge of Allegiance each morning.

Systems differ. These international studies, however, focus on identifying commonalities among the systems and seek to identify effective practices.

There has been three generations of IEA mathematics studies. The first international study (FIMS) was conducted about 40 years ago. It was the first study of its kind where empirical methods were used in a comparative context. Seven European systems, plus Australia, Japan, Israel, and the United States participated. Two main populations were sampled: students in a grade where the modal age was 13 and students in the terminal years of secondary schooling. The first was chosen because it was the last year when the entire cohort of a system was still in school. The latter was chosen because it contained mathematics specialists who represented the best a system produced.

FIMS asked big questions. How were outcomes related to school organization, curriculum, teacher competence and societal factors? Among their many findings was that talented students in comprehensive secondary schools did just as well as those in the highly differentiated schools of Europe. Hence, there were major school reforms across the Atlantic.

FIMS investigators knew that it was impossible to build a fair international test. Was the test, they asked, equally unfair to all the participants? In order to understand better their results they invented a measure called Opportunity to Learn (OTL). It was based on teachers’ judgments about whether the mathematics taught in an educational system was reflected in the mathematics test. Measures of OTL remain in international studies.

The two volumes of FIMS results contain no rank ordering of countries. The study was not designed to be an “international contest.” Rather it was to answer questions within “a framework of comparative thinking in education.” Instead of focusing mainly on the average achievement within a system and producing rank orders, FIMS asked questions about “yield.” How much mathematics did a system produce?

Perhaps the best example of yield comes from the Second International Study of Mathematics, SIMS. At that time 80% of a cohort remained in high school in the United States while a comparable percentage for Hungary was 50. In the United States a very small percent of students took advanced mathematics while in Hungary all of the 50% took calculus. Hungary’s yield, the culture’s mathematic knowledge, was magnitudes higher because of the participation in advanced mathematics.

SIMS, designed and conducted in the late 70’s and early 80’s, built on the first study and introduced a number of innovations. The most striking one was that eight of the twenty-odd systems included a pretest for the sample of eighth graders. That design change lead to a number of crucial findings.

First, it was possible to distinguish between the status of achievement (measurements at one time point) and the growth in achievement (differences between the beginning of a school year and the end of the year). This lead to questions about whether the correlates of status were the same as the correlates of growth. The answer was no. Background characteristics of students were much more highly correlated with status while characteristics of schools and teachers were much more highly correlated with growth.

Analyses of pretest variation showed that the United States did more and earlier tracking of students then did any of the other systems. The tracking was related to OTL and showed that students in other systems were exposed to much more material that was reflected in the test. OTL results and the tracking lead to findings about an “underachieving curriculum,” the title of the U.S. national report.

TIMSS, the third international study of both mathematics and science, has been reported in the media as mainly a series of rank orders. That is too bad because TIMSS has produced numerous volumes and reports providing rich contexts for understanding the results. One imaginative part of the study was video taping teachers from various international systems and comparing approaches to instruction in mathematics. One generalization of the results was that varied methods of instruction are used in the high performing systems. There is no methodological silver bullet!

Continuing the IEA tradition of focusing on the curriculum, where SIMS called the U.S. curriculum “a series of one night stands,” TIMSS used the language of a curriculum a “mile wide and an inch deep.” Educators seem aware of that jingle but it is not clear what has been enacted as a result of the findings.

The rank orders produced at least one interesting question: Why do U.S. 4th graders do comparably well and 8th graders not do well? SIMS may provide some answers to that question.

The U.S. curriculum is not only a mile wide and an inch deep, it also is very redundant. Not much new material is introduced and what is introduced is but to a small percentage of students because of tracking, an issue few wish to discuss.

As a way to end this piece I am presenting a copy of policy implications derived from the results of SIMS. This was written in the 1980’s, more than 20 years ago. I find it striking how “relevant” the findings seem - and, how little has been done to respond to them.

Policy Implications of International Comparisons

I. A richer mathematics education in the middle school

A. Findings
1. Much of the grade 8 instruction in the USA is review
2. Not much growth is seen
3. Except in classes where new content is presented
4. As in some other countries, where algebra is learned

B. Implications
1. Introduce more content-especially, algebra
2. Promote greater variety of content and approaches

II. Reduction in early tracking in mathematics

A. Findings '
1. The tracking is inconsistent and ineffective
2. The effect is to reduce opportunities for many to zero

B. Implications
1. Track by speed where absolutely necessary
2. But not at the expense of introducing new content for everyone

III. Intensification of mathematics training for specialists

A. Findings
1. Content in other countries is more advanced, and learned
2. Especially calculus
3. If more students take math, more learn
4. And the best students do not suffer
5. And the eventual selection of specialists is improved

B. Implications
1. Promote more calculus availability
2. Or develop good alternatives for intensive training
3. Increase the number of students taking more mathematics

IV. Advanced application of computers in mathematics education

A. Findings
1. Teaching is very textbook bound
2. Textbooks are limited tools and very conservative
3. Teachers are limited in training for curriculum development

B. Implications
1. Improvement must come from technology
2. By which is meant advanced applications of computers in schools
3. Which also changes the nature of the mathematical content
4. So invest heavily in R&D and innovation/experimentation

V. Monitoring of student opportunities and achievements

A. Findings
1. Student achievement is best viewed as growth in a course of study
2. Also, achievement depends on intensity and content of exposure

B. Implications
1. Statistical monitoring should include growth
2. And systematic surveys of OTL
3. Not just student normative testing

2 comments:

Anonymous said...

Dr. Kifer makes some good observations here.

The problems with US curriculum being a “mile wide and an inch deep,” especially in math, have indeed been known for some time. Real mathematicians have been raising alarms that students are not learning algebra for years. The problems of tracking, and spiraling, have also been hotly discussed for at least as long as I have been looking at KERA (now in my 15th year).

That being the case, why did the revisions to the Kentucky Core Content for Assessment back in 2006 fail to make these corrections? What went wrong? As a consequence, now in 2009 we have the Kentucky Senate forced to tell the state’s educators to do this years after it should have happened. Why is that happening? Why was Dr. Kifer’s, and lots of other people’s, advice ignored for so long?

These questions tie to Dr. Kifer’s ending comments about his interesting list of policy implications from international comparisons, which generally look quite worthwhile. Dr. Kifer says this list was created back in the 1980s. Here we are, over two decades later…and???

One last point – Dr. Kifer agrees in this paper with a position I have held for some time as well. The best education analysis comes from longitudinal assessments. But, the basic design of CATS cannot support serious longitudinal assessments due to matrixing and the lack of high validity and reliability of individual student scores. Yet, when Dr. Kifer testified last year about the Senate’s plan to replace CATS, he seemed to support CATS in preference to the types of testing needed for real longitudinal analysis. I must have missed something.

Anonymous said...

I assume "real mathematicians" are mathematicians. If so, the brush is too broad. Mathematicians differ on important educational issues. They can be, for instance, strong proponents of tracking. Hence, no algebra for many students.

I am not sure whether I was amused or bemused by Kentucky Senate Republicans recent discovery of NCTM standards. Had they looked they would have seen that the 2006 standards you refer to were, indeed, based on NCTM standards.

There are longitudinal studies and there are longitudinal studies. Actually one of the persons who worked on the policy list in my piece evaluated Sanders' value added methodology years ago. He was concerned that Sanders was then using the CTBS, an off the shelf assessment. It is unlikely that such a test would fit a particular "course of study." The current CATS design could be used and might be a very powerful way to do longitudinal designs at the school level. And, in fact, it might work reasonably well at the individual level. Form differences are pretty easy to adjust and the internal consistency of the measures is defensibly high.