Tuesday, January 24, 2012

Value-added evaluation of teachers - yet more problems

This from the Daily Kos:

The default assumption in the value-added literature
is that teacher effects are a fixed construct that is
independent of the context of teaching (e.g., types of
courses, student demographic compositions in a class,
and so on) and stable across time. Our empirical exploration
of teacher effectiveness rankings across different courses
and years suggested that this assumption
is not consistent with reality.
In particular, the fact that an individual student’s
learning gain is heavily dependent upon who else is
in his or her class, apart from the teacher,
raises questions about our ability to isolate a
teacher’s effect on an individual student’s learning,
no matter how sophisticated the statistical model might be.

Those words are from a new study on the stability of teachers scores using Value-Added methodologies toascertain teachers effects upon students scores on tests.

The study in question, titled Value-Added Modeling of Teacher Effectiveness: An Exploration of Stability across Models and Contexts, was released late last month at Education Policy Analysis Archives, a peer-reviewed online journal of education policy previously edited by our own SDorn, Sherman Dorn of the University of South Florida (disclosure - and like me an alumnus of Haverford College), who is still on the editorial board.  Two of the four authors of this study, Edward Haertel and Linda Darling-Hammond, are also among the authors of the EPI policy brief.

Abstract:
Recent policy interest in tying student learning to teacher evaluation has led to growing use of value-added methods for assessing student learning gains linked to individual teachers. VAM analyses rely on complex assumptions about the roles of schools, multiple teachers, student aptitudes and efforts, homes and families in producing measured student learning gains. This article reports on analyses that examine the stability of high school teacher effectiveness rankings across differing conditions. We find that judgments of teacher effectiveness for a given teacher can vary substantially across statistical models, classes taught, and years. Furthermore, student characteristics can impact teacher rankings, sometimes dramatically, even when such characteristics have been previously controlled statistically in the value-added model. A teacher who teaches less advantaged students in a given course or year typically receives lower effectiveness ratings than the same teacher teaching more advantaged students in a different course or year. Models that fail to take student demographics into account further disadvantage teachers serving large numbers of low-income, limited English proficient, or lower-tracked students. We examine a number of potential reasons for these findings, and we conclude that caution should be exercised in using student achievement gains and value-added methods to assess teachers’ effectiveness, especially when the stakes are high.

In other words: 
We find that judgments of teacher effectiveness for a given teacher can vary substantially across statistical models, classes taught, and years.   The key words are vary substantially -  if we are talking about something that is basic, it should not vary that much because of the particular value-added methodology being used.  For point of comparison, where I do measure you by yardstick, tape measure, using at one time inches and another centimeters, if I am really measuring the same thing (height) I would get consistent results (which perhaps some slight variation due to measurement error).

But the method of measuring is only one problem.  If there are different results because of classes taught, that MAY be because of different levels of effectiveness in different curricula, or it could be something else.  And if there is substantial variance from year to year, either the teacher is very inconsistent (which does not seem all that likely) or that variance is due to something not under control of the teacher.

Furthermore, student characteristics can impact teacher rankings, sometimes dramatically, even when such characteristics have been previously controlled statistically in the value-added model. -  this is CRITICAL.  People have justifiably argued that using the single score at the end of a year tells us little about what the teacher has done, and may be primarily due to knowledge with which students arrived.  Value-added analysis is supposed to provide a method that allows us to control for different characteristics among students, and thus enable use to focus in on what impact the teachers actually had.  But if despite our attempts to control for variance among students, that variance still seriously impacts the derived value-added score, then relying upon value-added approaches is dangerous, because we will be making decisions that cannot be justified by the data we are using...

No comments: