Education Drivers
Publications
This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes. The history of teacher evaluation and current research findings and implications are included.
Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. https://www.winginstitute.org/quality-teachers-evaluation.
The purpose of this overview is to provide information about the role of formal teacher evaluation, the research that examines the practice, and its impact on student outcomes.
Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Formal Evaluation. Oakland, CA: The Wing Institute.https://www.winginstitute.org/teacher-evaluation-formal.
This overview examines the current understanding of research on performance feedback as a way to improve teacher performance and student outcomes.
Cleaver, S., Detrich, R. & States, J. (2019). Overview of Performance Feedback. Oakland, CA: The Wing Institute. https://www.winginstitute.org/teacher-evaluation-feedback.
The purpose of this overview is to provide information about informal evaluation as a practice in schools, and the current understanding of research related to informal teacher evaluation to improve teacher performance and student outcomes.
Cleaver, S., Detrich, R., & States, J. (2019). Informal Teacher Evaluation. Oakland, CA: The Wing Institute. Retrieved from https://www.winginstitute.org/staff-informal.
The purpose of this overview is to provide an understanding of the research base on professional development and its impact on student achievement, as well as offer recommendations for future teacher professional development
Cleaver, S., Detrich, R., States, J. & Keyworth, R. (2020). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. quality-teachers-in-service.
This study presents a reanalysis of a large randomized controlled trial of school-based mentoring and examines the estimated effect of mentoring as a function of mentee-reported relationship quality using a novel statistical approach.
This paper is based on the simple idea that students’ educational achievement is affected by the effort put in by those participating in the education process: schools, parents, and, of course, the students themselves.
DeFraja, G., Oliveira, T., & Zanchi, L. (2010). Must try harder: Evaluating the role of effort in educational attainment. The Review of Economics and Statistics, 92(3), 577–597. Retrieved from https://art.torvergata.it/retrieve/handle/2108/55644/108602/De%20Fraja%20Zanch%20Oliveira%20REStats%202010.pdf
The Benchmarks for Advanced Tiers (BAT) allows school teams to self-assess the implementation status of Tiers 2 (secondary, targeted) and 3 (tertiary, intensive) behavior support systems within their school. The BAT is based on factors drawn from the Individual Student Systems Evaluation Tool (I-SSET). School teams can use the BAT to build an action plan to delineate next steps in the implementation process.
Anderson, C., Childs, K., Kincaid, D., Horner, R. H., George, H., Todd, A. W., & Spaulding, S. (2009). Benchmarks for advanced tiers. Eugene, OR: Educational and Community Supports, University of Oregon.
This article examines issues in developing valid and reliable direct observation of behavior. Suggestions are made to minimize the problems that threaten validity and reliability. The discussion is concluded by an examination of costs and benefits of direct observation and who pays them and who benefits from them.
Baer, D. M., Harrison, R., Fradenburg, L., Petersen, D., & Milla, S. (2005). Some pragmatics in the valid and reliable recording of directly observed behavior. Research on Social Work Practice, 15(6), 440-451.
The CJAA funded the nation’s first statewide experiment concerning research-based programs for juvenile justice. The question here was whether they work when applied statewide in a “real world” setting. This report indicates that the answer to this question is yes— when the programs are competently delivered.
Barnoski, R., & Aos, S. (2004). Outcome evaluation of Washington State’s research-based programs for juvenile offenders. Olympia, WA: Washington State Institute for Public Policy, 460.
This article evaluates a procedure‐based scoring system for a performance assessment (an observed paper towels investigation) and a notebook surrogate completed by fifth‐grade students varying in hands‐on science experience.
Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure‐based scoring for hands‐on science assessment. Journal of Educational Measurement, 29(1), 1-17.
The Hexagon Discussion and Analysis Tool helps organizations evaluate new and existing programs and practices. This tool is designed to be used by a team to ensure diverse perspectives are represented in a discussion of the six contextual fit and feasibility factors.
Blase, K., Kiser, L. and Van Dyke, M. (2013). The Hexagon Tool: Exploring Context. Chapel Hill, NC: National Implementation Research Network, FPG Child Development Institute, University of North Carolina at Chapel Hill.
This paper examines how to measure teacher performance and the practices necessary for increasing teacher trust in systems designed to effectively measure performance.
Cantrell, S., & Scantlebury, J. (2011). Effective Teaching: What Is It and How Is It Measured?. Effective Teaching as a Civil Right, 28.
In this study, the authors evaluate the efficacy of videotape analysis with structured expert consultation and self-evaluation to improve teacher candidates’ instructional delivery. A single-case, multiple-baseline, across-participants design was used to evaluate lesson components, rate of praise statements, and rate of opportunities to respond included by teacher candidates in their teaching.
Capizzi, A. M., Wehby, J. H., & Sandmel, K. N. (2010). Enhancing mentoring of teacher candidates through consultative feedback and self-evaluation of instructional delivery. Teacher Education and Special Education, 33(3), 191-212.
This study developed a zero-to-five index of the strength of accountability in 50 states based on the use of high-stakes testing to sanction and reward schools, and analyzed whether that index is related to student gains on the NAEP mathematics test in 1996–2000.
Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305-331.
Recent efforts to attract and retain effective educators and to improve teacher practices have focused on reforming evaluation and compensation systems for teachers and principals. In 2006, Congress established the Teacher Incentive Fund (TIF), which provides grants.
Chiang, H., Wellington, A., Hallgren, K., Speroni, C., Herrmann, M., Glazerman, S., & Constantine, J. (2015). Evaluation of the Teacher Incentive Fund: Implementation and Impacts of Pay-for-Performance after Two Years. NCEE 2015-4020. National Center for Education Evaluation and Regional Assistance.
This chapter provides an integrative approach to measurement as it is likely to be applied to assessment and evaluation within an RTI framework
Christ, T. J., & Hintze, J. M. (2007). Psychometric considerations when evaluating response to intervention. In Handbook of response to intervention (pp. 93-105). Springer, Boston, MA.
This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes. The history of teacher evaluation and current research findings and implications are included.
Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. https://www.winginstitute.org/quality-teachers-evaluation.
The purpose of this overview is to provide information about the role of formal teacher evaluation, the research that examines the practice, and its impact on student outcomes.
Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Formal Evaluation. Oakland, CA: The Wing Institute.https://www.winginstitute.org/teacher-evaluation-formal.
This overview examines the current understanding of research on performance feedback as a way to improve teacher performance and student outcomes.
Cleaver, S., Detrich, R. & States, J. (2019). Overview of Performance Feedback. Oakland, CA: The Wing Institute. https://www.winginstitute.org/teacher-evaluation-feedback.
The purpose of this overview is to provide information about informal evaluation as a practice in schools, and the current understanding of research related to informal teacher evaluation to improve teacher performance and student outcomes.
Cleaver, S., Detrich, R., & States, J. (2019). Informal Teacher Evaluation. Oakland, CA: The Wing Institute. Retrieved from https://www.winginstitute.org/staff-informal.
The purpose of this overview is to provide an understanding of the research base on professional development and its impact on student achievement, as well as offer recommendations for future teacher professional development
Cleaver, S., Detrich, R., States, J. & Keyworth, R. (2020). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. quality-teachers-in-service.
This brief explores research that points to the opportunities and the challenges that evaluating teacher preparation programs differently presents.
Coggshall, J. G., Bivona, L., & Reschly, D. J. (2012). Evaluating the Effectiveness of Teacher Preparation Programs for Support and Accountability. Research & Policy Brief. National Comprehensive Center for Teacher Quality. Retrieved from https://eric.ed.gov/?id=ED543773
The authors examined the extent to which program integrity (i.e., the degree to which programs were implemented as planned) was verified and promoted in evaluations of primary and early secondary prevention programs published between 1980 and 1994.
Dane, A. V., & Schneider, B. H. (1998). Program integrity in primary and early secondary prevention: are implementation effects out of control?. Clinical psychology review, 18(1), 23-45.
This article presents a conceptual framework for examining the design and implementation of teacher evaluation processes in school organizations.
Darling-Hammond, L., Wise, A. E., & Pease, S. R. (1983). Teacher evaluation in the organizational context: A review of the literature. Review of educational research, 53(3), 285-328.
A growing number of researchers are studying whether value-added measures can do a good job of measuring the contribution of teachers to test score growth. Here I summarize a handful of analyses that shed light on two questions.
David, J. L. (2010). What research says about using value-added measures to evaluate teachers. Educational Leadership, 67(8), 81–82. Retrieved from http://www.ascd.org/publications/educational_leadership/may10/vol67/num08/Using_Value-Added_Measures_to_Evaluate_Teachers.aspx
This article examines the politics of principal evaluation through both an extensive review of the literature and in-depth interviews with principals and superintendents. The findings reveal that the format and processes used in principal evaluation often vary from one district to another and that principals and superintendents frequently hold different perspectives about the purposes and usefulness of evaluation.
Davis, S. H., & Hensley, P. A. (1999). The politics of principal evaluation. Journal of Personnel Evaluation in Education, 13(4), 383-403.
Principal evaluations can be important tools for improving leadership practice, but evaluations have often described by principals and researchers as unsystematic and lacking timely and actionable feedback. This study examines principal perceptions of the Texas Principal Evaluation and Support System.
DeMatthews, D. E., Scheffer, M., & Kotok, S. (2020). Useful or useless? Principal perceptions of the Texas principal evaluation and support system. Journal of Research on Leadership Education, 1942775120933920.
Problems associated with the school psychologists traditional assessment functions and methodology are identified and contrasted with the need for assessment information that can contribute meaningfully to the formulation and evaluation of educational interventions.
Deno, S. L. (1986). Formative evaluation of individual student programs: A new role for school psychologists. School Psychology Review.
In the past decade, nearly all states have revised their principal evaluation policies, prompting school districts across the country to rethink how they are evaluating school leaders. The new principal evaluation systems that emerge out of these policy reforms often couple increased accountability with a greater emphasis on development in an effort to spur continuous improvement in school leadership practices.
Donaldson, M., Mavrogordato, M., Youngs, P., & Dougherty, S. (2020). Appraising Principal Evaluation and Development: Current Research and Future Directions. Exploring Principal Development and Teacher Outcomes, 56-68.
A prominent philosophical influence in school consultation research proposes that to effectively serve children, school psychologists must work primarily and paradoxically on changing the behavior of adults. Although the consultation literature base has traditionally conceptualized this role within a dyadic model emphasizing the consultant-consultee relationship, much of school consultation today occurs in the context of teams rather than dyads.
Dowd-Eagle, S., & Eagle, J. (2014). Team-based school consultation. In Handbook of research in school consultation (pp. 464-486). Routledge.
Principals are in a paradoxical position. On one hand, they're called on to use research-based strategies to improve student achievement. On the other, they're increasingly required to micromanage teachers by observing in classrooms and engaging in intensive evaluation. The authors point out that these two positions are at odds with each other.
Dufour, R., & Mattos, M. (2013). How Do Principals Really Improve Schools?. Educational Leadership, 70(7), 34-40.
C-SAIL was established in July 2015 as a resource on the implementation and effects of college and career readiness standards. The Center is funded through a grant from the Institute of Education Sciences (IES) of the U.S. Department of Education.
Edgerton, A. Polikoff, M., Desimone, L. (2017). How is policy affecting classroom instruction?. Evidence Speaks Reports. Volume 2, #14. The Center on Standards, Alignment, Instruction, and Learning (C-SAIL).
Research strongly suggests that feedback obtained through direct observations of performance can be a powerful tool for improving teacher’s skills. This study examines a peer teacher observation method used in England. The study found no evidence that Teacher Observation improved student language and math scores.
Education Endowment Foundation (2017). Teacher Observation. Education Endowment Foundation. Retrieved https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects/teacher-observation/.
a brief overview of findings from the Blueprints for Violence Prevention-Replication Initiative is presented, identifying factors that enhance or impede a successful implementation of these programs.
Elliott, D. S., & Mihalic, S. (2004). Issues in disseminating and replicating effective prevention programs. Prevention Science, 5(1), 47-53.
This section includes tools and resources that can help school leaders, teachers, and other stakeholders be more strategic in their decision-making about planning, implementing, and evaluating evidence-based interventions to improve the conditions for learning and facilitate positive student outcomes.
Elliott, S. N., Witt, J. C., & Kratochwill, T. R. (1991). Selecting, implementing, and evaluating classroom interventions. Interventions for achievement and behavior problems, 99-135.
This section includes tools and resources that can help school leaders, teachers, and other stakeholders be more strategic in their decision-making about planning, implementing, and evaluating evidence-based interventions to improve the conditions for learning and facilitate positive student outcomes.
Elliott, S. N., Witt, J. C., & Kratochwill, T. R. (1991). Selecting, implementing, and evaluating classroom interventions. Interventions for achievement and behavior problems, 99-135.
The Every Student Succeeds Act (ESSA) returns decision making for our nation’s education back where it belongs – in the hands of local educators, families, and communities – while keeping the focus on students most in need.
Fennell, M. (2016). What educators need to know about ESSA. Educational Leadership, 73, 62–65.
This book shows how principals and other school leaders can develop the skills necessary for teachers to deliver high quality instruction by introducing principals to a five-part model of effective instruction.
Fink, S., & Markholt, A. (2011). Leading for instructional improvement: How successful leaders develop teaching and learning expertise. John Wiley & Sons.
Part One provides the reader with information essential to understanding not only the content of the section that follow but also the wealth of material that exists in the literature on program evaluation. Part Two introduces you to different approaches to evaluation to enlarge your understanding of the diversity of choices that evaluators and stakeholders make in undertaking evaluation.
Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2003). Program Evaluation: Alternative Approaches and Practical Guidelines.
A growing number of evidence-based psychotherapies hold the promise of substantial benefits for children, families, and society. For the benefits of evidence-based programs to be realized on a scale sufficient to be useful to individuals and society, evidence-based psychotherapies need to be put into practice outside of controlled clinical trials.
Fixsen, D. L., Blase, K. A., Duda, M. A., Naoom, S. F., & Van Dyke, M. (2010). Implementation of evidence-based treatments for children and adolescents: Research findings and their implications for the future.
This article describes a school-based randomized trial in over 200 New York City public schools designed to better understand the impact of teacher incentives.
Fryer, R. G. (2013). Teacher incentives and student achievement: Evidence from New York City public schools. Journal of Labor Economics, 31(2), 373-407.
This meta-analysis investigated the effects of formative evaluation procedures on student achievement. The data source was 21 controlled studies, which generated 96 relevant effect sizes, with an average weighted effect size of .70. The magnitude of the effect of formative evaluation was associated with publication type, data-evaluation method, data display, and use of behavior modification. Implications for special education practice are discussed.
Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional children, 53(3), 199-208.
This study examined the educational effects of repeated curriculumbased measurement and evaluation. Thirty-nine special educators, each having three to four pupils in the study, were assigned randomly to a repeated curriculum-based measurement/evaluation (experimental) treatment or a conventional special education evaluation (contrast) treatment
Fuchs, L. S., Deno, S. L., & Mirkin, P. K. (1984). The effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21(2), 449-460.
Educator performance evaluation systems are a potential tool for improving student achievement by increasing the effectiveness of the educator workforce. For example, recent research suggests that giving more frequent, specific feedback on classroom practice may lead to improvements in teacher performance and student achievement.
Garet, M. S., Wayne, A. J., Brown, S., Rickles, J., Song, M., & Manzeske, D. (2017). The Impact of Providing Performance Feedback to Teachers and Principals. NCEE 2018-4001. National Center for Education Evaluation and Regional Assistance.
The authors have been evaluating the impact of five principal preparation programs in the United States on student outcomes. This information should be considered as one aspect of preparation program improvement and accountability. The study team lays out its recommendations in this policy paper.
George W. Bush Institute & Education Reform Initiative. (2016). Developing Leaders: The Importance--and the Challenges--of Evaluating Principal Preparation Programs, Retrieved from https://gwbcenter.imgix.net/Resources/gwbi-importance-of-evaluating-principal-prep.pdf
This article describes the procedures and utility of the Benchmarks of Quality as part of a comprehensive evaluation plan to assess the universal level of implementation fidelity of behavior support for a school. However, results can also be examined to determine the level of implementation fidelity across a district or state for ongoing behavioral training and technical assistance planning.
George, H. P., & Childs, K. E. (2012). Evaluating implementation of schoolwide behavior support: Are we doing it well?. Preventing School Failure: Alternative Education for Children and Youth, 56(4), 197-206.
Principal evaluation shares many characteristics with the more general field of personnel evaluation. That is, evaluations may have the purpose of gathering data to help improve performance (formative), or may use the collected information to make decisions about promotion or firing (summative).
Ginsberg, R., & Berry, B. (1990). The folklore of principal evaluation. Journal of personnel evaluation in education, 3(3), 205-230.
This research synthesis examines how teacher effectiveness is currently measured (i.e., formative vs. summative evaluation).
Goe, L., Bell, C., & Little, O. (2008). Approaches to Evaluating Teacher Effectiveness: A Research Synthesis. National Comprehensive Center for Teacher Quality.
This guide is a tool designed to assist states and districts in constructing high-quality teacher evaluation systems in an effort to improve teaching and learning.
Goe, L., Holdheide, L., & Miller, T. (2011). A Practical Guide to Designing Comprehensive Teacher Evaluation Systems: A Tool to Assist in the Development of Teacher Evaluation Systems. National Comprehensive Center for Teacher Quality.
This paper report on work estimating the stability of value-added estimates of teacher effects, an important area of investigation given that new workforce policies implicitly assume that effectiveness is a stable attribute within teachers.
Goldhaber, D. D., & Hansen, M. (2008). Is it Just a Bad Class?: Assessing the Stability of Measured Teacher Performance. Seattle, WA: Center on Reinventing Public Education.
As performance feedback continues to become more commonplace in school settings, it will become increasingly necessary to build capacity around the processes of giving and receiving feedback. Results from this study have implications for how principals can be supported to use their evaluation data.
Goldring, E. B., Mavrogordato, M., & Haynes, K. T. (2015). Multisource principal evaluation data: Principals’ orientations and reactions to teacher feedback regarding their leadership effectiveness. Educational Administration Quarterly, 51(4), 572-599.
This paper provide some recommendations to increase the pool of potential teachers, make it tougher to award tenure to those who perform least well, and reward effective teachers who are willing to work in schools serving large numbers of low-income, disadvantaged children.
Gordon, R., Kane, T. J., & Staiger, D. O. (2006). Identifying Effective Teachers Using Performance on the Job. The Hamilton Project Policy Brief No. 2006-01. Brookings Institution.
Is dismissing poorly performing teachers truly feasible in America today? After all the political capital (and real capital) spent on reforming teacher evaluation, can districts actually terminate ineffective teachers who have tenure or have achieved veteran status?
Griffith, D., & McDougald, V. (2016). Undue process: Why bad teachers in twenty-five diverse districts rarely get fired. Washington DC: Thomas B. Fordham Institute. Retrieved from http://edex. s3-us-west-2. amazonaws. com/publication/pdfs, 2812, 29.
This article examine the concept of behavioral contingency and describes NCLB as a set of contingencies to promote the use of effective educational practices. Then they describe their design of an alternate assessment, including the components designed to capitalize on the contingencies of NCLB to promote positive educational outcomes.
Hager, K. D., Slocum, T. A., & Detrich, R. (2007). No Child Left Behind, Contingencies, and Utah’s Alternate Assessment. JEBPS Vol 8-N1, 63.
The major objective of this data analysis was to estimate the relationship between variables which can be controlled by public policy and educational output.
Hanushek, E. A. (1971). Teacher characteristics and gains in student achievement: Estimation using micro data. American Economic Review, 61(2), 280-288.
This discussion provides a quantitative statement of one approach to achieving the governors’ (and the nation’s) goals – teacher deselection.
Hanushek, E. A. (2009). Teacher deselection. Creating a new teaching profession, 168, 172-173.
This paper reports on the analysis of state statutes and department of education regulations in fifty states for changes in teacher evaluation in use since the passage of No Child Left Behind Act of 2001.
Hazi, H. M., & Rucinski, D. A. (2009). Teacher evaluation as a policy target for improved student learning: A fifty-state review of statute and regulatory action since NCLB. education policy analysis archives, 17, 5.
In 2011, as a part of the State Board of Education's implementation of North Carolina's Race to the Top initiative, a sixth standard—a measure of student growth, the Educational Value-Added Assessment System—was added to the existing five standards for evaluating teachers. The purpose of this report is to describe the outcomes of teacher evaluations that have occurred since the sixth standard was added and trends in those outcomes through 2013-14.
Henry, G. T., & Guthrie, J. E. (2015). An evaluation of the North Carolina educator evaluation system and the student achievement growth standard.
This article discusses the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts.
Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371-384.
Direct observation plays an important role in the assessment practices of school psychologists and in the development of evidence-based practices in general and special education. The defining psychometric features of direct observation are presented, the contributions to assessment practice reviewed, and a specific proposal is offered for evaluating the psychometric merit of direct observation in both practitioner developed and commercial/research specific applications.
Hintze, J. M. (2005). Psychometrics of direct observation. School Psychology Review, 34(4), 507-519.
NSDC opens the door to professional learning that ensures great teaching for every student every day
Hirsh, S. (2009). A new definition. Journal of Staff Development, 30(4), 10–16.
This book presents clear and functional techniques for deciding what students with learning disabilities should be taught and how. This book can also function as a tool to assist pre-service teachers (students) with deciding how to teach and what to teach to regular/non-special education children.
Howell, K. W. (1993). Curriculum-based evaluation: Teaching and decision making. Cengage Learning.
In recent years, most states have constructed elaborate accountability systems using school-level test scores. We evaluate the implications for school accountability systems. For instance, rewards or sanctions for schools with scores at either extreme primarily affect small schools and provide weak incentives to large ones.
Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise school accountability measures. Journal of Economic perspectives, 16(4), 91-114.
This study used a random-assignment experiment in Los Angeles Unified School District to evaluate various non-experimental methods for estimating teacher effects on student test scores. Having estimated teacher effects during a pre-experimental period, the authors used these estimates to predict student achievement following random assignment of teachers to classrooms.
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An experimental evaluation (No. w14607). National Bureau of Economic Research.
This report presents an in-depth discussion of the analytical methods and findings from the Measures of Effective Teaching (MET) project’s analysis of classroom observations.1 A nontechnical companion report describes implications for policymakers and practitioners.
Kane, T. J., & Staiger, D. O. (2012). Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains. Research Paper. MET Project. Bill & Melinda Gates Foundation.
This paper combines information from classroom-based observations and measures of teachers' ability to improve student achievement as a step toward addressing these challenges. The results point to the promise of teacher evaluation systems that would use information from both classroom observations and student test scores to identify effective teachers.
Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective classroom practices using student achievement data. Journal of human Resources, 46(3), 587-613.
Considers design issues and strategies by comparative outcome studies, including the conceptualization, implementation, and evaluation of alternative treatments; assessment of treatment-specific processes and outcomes; and evaluation of the results. It is argued that addressing these and other issues may increase the yield from comparative outcome studies and may attenuate controversies regarding the adequacy of the demonstrations.
Kazdin, A. E. (1986). Comparative outcome studies of psychotherapy: methodological issues and strategies. Journal of consulting and clinical psychology, 54(1), 95.
Data Evaluation of 285 previous chapters have discussed fundamental issues about assessment and experimental design for single-case research. I mentioned that the third component of methodology after assessment and design is data evaluation. The three components work in concert to permit one to draw inferences about the intervention.
Kazdin, A. E., & Tuma, A. H. (1982). Single-case research designs.
Substantial variation was found in the relationship of evaluators' ratings of teachers and value-added measures of the average achievement of the teachers' students. The results did not yield a simple explanation for the differences in validity of evaluators' ratings. Instead, evaluators' decisions were found to be a complex and idiosyncratic function of motivation, skill, and context.
Kimball, S. M., & Milanowski, A. (2009). Examining teacher evaluation validity and leadership decision making within a standards-based evaluation system. Educational Administration Quarterly, 45(1), 34-70.
The purpose of this study was to examine the validity of teacher evaluation scores that are derived from an observation tool, adapted from Danielson's Framework for Teaching, designed to assess 22 teaching components from four teaching domains.
Lash, A., Tran, L., & Huang, M. (2016). Examining the Validity of Ratings from a Classroom Observation Instrument for Use in a District's Teacher Evaluation System. REL 2016-135. Regional Educational Laboratory West.
the present study was designed to learn more about how to strengthen the integrity of the problem-solving process
Lundahl, A. A. (2010). Effects of Performance Feedback and Coaching on the Problem-Solving Process: Improving the Integrity of Implementation and Enhancing Student Outcomes. ProQuest LLC. 789 East Eisenhower Parkway, PO Box 1346, Ann Arbor, MI 48106.
In this article, guidelines for evaluating the technical adequacy of that process are described. The guidelines highlight the interdependencies among assessment functions, subsumed by the goal of helping, and the role of structural factors (e.g., collaboration) in shaping the meaningfulness, appropriateness, and usefulness of the assessment-intervention process.
Macmann, G. M., Barnett, D. W., Allen, S. J., Bramlett, R. K., Hall, J. D., & Ehrhardt, K. E. (1996). Problem solving and intervention design: Guidelines for the evaluation of technical adequacy. School Psychology Quarterly, 11(2), 137.
The authors show school and district-level administrators how to set the priorities and support the practices that will help all teachers become expert teachers. Their five-part framework is based on what research tells us about how expertise develops.
Marzano, R. J., Frontier, T., & Livingston, D. (2011). Effective supervision: Supporting the art and science of teaching. Ascd.
Building on the analysis that was first reported in School Leadership That Works, the authors of Balanced Leadership identify the 21 responsibilities associated with effective leadership and show how they relate to three overarching responsibilities:
Marzano, R. J., Waters, T., & McNulty, B. A. (2001). School leadership that works: From research to results. ASCD.
This review addresses two different proposals for reforming teacher training, neither of which is grounded in research. The authors report three weak correlations between the performance of program participants and TNTP’s certification evaluation rubric. The report concludes with three self-evident aphorisms: practice improves teaching, teachers who master teaching skills do better, and inadequate performers should be weeded out.
Mathis, W. (2014). Review of Two Alternative Teacher Preparation Proposals. National Education Policy Center. Retrieved from http://greatlakescenter.org/docs/Think_Twice/TT_Mathis_TeacherPrep.pdf
States and districts across the country are revising how they evaluate school principals. Those that are doing so face a substantial challenge: there is scant evidence on the validity and reliability of current principal evaluation tools. Pennsylvania is among states that are developing a new tool for evaluating principals and assistant principals.
McCullough, M., Lipscomb, S., Chiang, H., Gill, B., & Cheban, I. (2016). Measuring school leaders’ effectiveness: Final report from a multiyear pilot of Pennsylvania’s Framework for Leadership (No. afa7e4c19e4140f3b17422e994fc4e1d). Mathematica Policy Research.
In December 2016, Bellwether Education Partners and The Thomas B. Fordham Institute independently released two reports centered on teacher evaluation and its consequences. Both reports offer a glimpse into ongoing challenges and opportunities with teacher evaluation reform, but they have very different analyses.
McDougald, V., Griffith, D., Pennington, K., & Mead, S. (2016). What is the purpose of teacher evaluation today? A conversation between Bellwether and Fordham. Retrieved from https://edexcellence.net/articles/what-is-the-purpose-of-teacher-evaluation-today-a-conversation-between-bellwether-and
Full and durable implementation of school-based interventions is supported by regular evaluation of fidelity of implementation. Multiple assessments have been developed to evaluate the extent to which schools are applying the core features of school-wide positive behavioral interventions and supports (SWPBIS).
McIntosh, K., Massar, M. M., Algozzine, R. F., George, H. P., Horner, R. H., Lewis, T. J., & Swain-Bradway, J. (2017). Technical adequacy of the SWPBIS tiered fidelity inventory. Journal of Positive Behavior Interventions, 19(1), 3-13.
Under new frameworks, districts have better aligned their evaluations with their school-leadership standards and developed nuanced rubrics for evidence-collection and evaluation ratings. They have also altered the role of principal supervisors so that they spend more time in schools working with principals.
Mendels, P. (2017). Getting Intentional about Principal Evaluations. Educational Leadership, 74(8), 52-56.
The edTPA is a performance assessment of teaching used variously by some state education departments and institutions of higher education for licensure testing and evaluating teaching candidates. The edTPA artifacts include written commentaries, lesson plans, video segments, and samples of student work, based on a series of three to five consecutive lessons.
Meuwissen, K., Choppin, J., Shang-Butler, H., & Cloonan, K. (2015). Teaching candidates’ perceptions of and experiences with early implementation of the edTPA licensure examination in New York and Washington States. Warner School of Education, University of Rochester.
This report describes how the seven Regional Educational Laboratory (REL) Central states (Colorado, Kansas, Missouri, Nebraska, North Dakota, South Dakota, and Wyoming) evaluate their teacher preparation programs and the changes they are making to improve their approaches to evaluation.
Meyer, S. J., Brodersen, R. M., & Linick, M. A. (2014). Approaches to Evaluating Teacher Preparation Programs in Seven States. REL 2015-044. Regional Educational Laboratory Central. Retrieved from https://eric.ed.gov/?id=ED550491
This paper summarizes validity evidence pertaining to several different implementations of the Framework. It is based primarily on reviewing the published and unpublished studies that have looked at the relationship between teacher evaluation ratings made using systems based on the Framework and value-added measures of teacher effectiveness.
Milanowski, A. T. (2011). Validity Research on Teacher Evaluation Systems Based on the Framework for Teaching. Online Submission.
This report is concerned with only one of the many causes and dimensions of the problem, but it is the one that undergirds American prosperity, security, and civility.
National Commission on Excellence in Education. (1983). A Nation at Risk: The imperative for education reform. Retrieved from https://www.edreform.com/wp-content/uploads/2013/02/A_Nation_At_Risk_1983.pdf
This study examines the impact of 2 forms of professional development on prekindergarten teachers' early language and literacy practice: coursework and coaching.
Neuman, S. B., & Wright, T. S. (2010). Promoting language and literacy development for early childhood educators: A mixed-methods study of coursework and coaching. Elementary School Journal, 11,63-86. No Child Left Behind Act of 2001, P.L. 107-110, 20 U.S.C. § 6319 (2002).
The goal of the study was to examine the effects of coaching or professional development coursework on teacher knowledge and teacher practice.
Neuman, S. B., & Wright, T. S. (2010). Promoting language and literacy development for early childhood educators: A mixed-methods study of coursework and coaching. Elementary School Journal, 111,63–86.
This research evaluated procedures for training supervisors in a residential setting to provide feedback
for maintaining direct‐service staff members' teaching skills with people who have severe disabilities.
Parsons, M. B., & Reid, D. H. (1995). Training residential supervisors to provide feedback for maintaining staff teaching skills with people who have severe disabilities. Journal of Applied Behavior Analysis, 28(3), 317-322.
As states and districts consider potential changes to their teacher evaluation systems and policies, this paper seeks to inform those efforts by reviewing the evolution of the teacher evaluation policy movement over the last several years, identifying positive outcomes of new systems and negative consequences, and describing risks that should be considered.
Pennington, K., & Mead, S. (2016). For good measure? Teacher evaluation policy in the ESSA era. Washington, DC: Bellwether Education Partners. Retrieved from https://bellwethereducation.org/publication/good-measure-teacher-evaluation-policy-essa-era
This handbook advocates a new approach to teacher evaluation as a cooperative effort undertaken by a group of professionals.
Peterson, K. D. (2000). Teacher evaluation: A comprehensive guide to new directions and practices. Corwin Press.
In this provocative and persuasive new book, the author asserts that the secret to high performance and satisfaction-at work, at school, and at home—is the deeply human need to direct our own lives, to learn and create new things, and to do better by ourselves and our world.
Pink, D. H. (2011). Drive: The surprising truth about what motivates us. Penguin.
Donald Campbell was an American social psychologist and noted experimental social science researcher who did pioneering work on methodology and program evaluation. He has also become—posthumously—an unlikely hero of the anti-testing and accountability movement in the United States.
Porter-Magee, K. (2013, February 26). Trust but verify: The real lessons of Campbell’s Law. Thomas B. Fordham Institute.
Research using student scores on standardized tests confirms the common perception that some teachers are more effective than others. It also reveals that being taught by an effective teacher has important consequences for student achievement. The best way to assess a teacher's effectiveness is to look at his or her on-the-job performance.
RAND Education. (2012).Teachers matter: Understanding teachers’ impact on student achievement, Santa Monica, Calif.: Author. Retrieved from https://www.rand.org/pubs/corporate_pubs/CP693z1-2012-09.html
In order to provide accurate estimates of how much teachers affect the achievement of their students, this study used panel data covering over a decade of elementary student test scores and teacher assignment in two contiguous New Jersey school districts.
Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American economic review, 94(2), 247-252.
This review examined the overlap between state-created curriculum evaluation tools and The Hexagon Tool created by the National Implementation Research Network. The author followed systematic procedures while conducting a web search and visiting each state’s department of education website in search of curriculum evaluation tools.
Rolf, R., R. (2019). State Department of Education Support for Implementation Issues Faced by School Districts during the Curriculum Adoption Process. Oakland, CA: The Wing Institute. https://www.winginstitute.org/student-research-2019.
This report is the first in a series by the National Council on Teacher Quality (NCTQ) that examines the current status of states' teacher policies. Updated on a two-year cycle, each will cover a specific area of teacher policy. This report focuses on state teacher policies governing what states require in evaluations of both teachers and principals.
Ross, E., & Walsh, K. (2019). State of the states 2019: Teacher and principal evaluation policy. Washington, DC: National Council on Teacher Quality.
This research considers relationships between student achievement (knowledge and cognitive skill), teacher efficacy (Gibson & Dembo, 1984), and interactions with assigned coaches (self-report measures) in a sample of 18 grade 7 and 8 history teachers in 36 classes implementing a specific innovation with the help of 6 coaches.
Ross, J. A. (1992). Teacher efficacy and the effects of coaching on student achievement. Canadian Journal of Education, 17(1), 51–65.
The author develop falsification tests for three widely used VAM specifications, based on the idea that future teachers cannot influence students' past achievement.
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. The Quarterly Journal of Economics, 125(1), 175-214.
This essay seeks to help you put the hard-earned experience of others to use through a set of practical steps, prompts, and tips for matching the right evaluator to your need.
S.D. Bechtel, Jr. Foundation. (2018). Hiring an External Evaluator. Retrieved from http://sdbjrfoundation.org/wp-content/uploads/2018/04/04_Evaluation-Consultant_2018Oct25.pdf
The purpose of this study was to explore the purpose and practices of leadership evaluation as perceived by principals. The researcher wanted to identify the perceived purposes and practices of leadership evaluation as described by nine public school principals, and to respond to the apparent need expressed by administrators to receive substantive feedback.
Sanders, K. (2008). The purpose and practices of leadership assessment as perceived by select public middle and elementary school principals in the Midwest. Aurora University.
Sawchuk, S. (2015). Teacher evaluation: An issue overview. Education Week, 35(3), 1-6.
Written by one of the leaders in evaluation, Evaluation Thesaurus, Fourth Edition, provides readers with a quick analysis of the leading concepts, positions, acronyms, processes, techniques, and checklists in the field of evaluation.
Scriven, M. (1991). Evaluation thesaurus. Sage.
This book is organized around four dominant interrelated core issues: professional standards, a guide to applying the Joint Committee's Standards, ten alternative models for the evaluation of teacher performance, and an analysis of these selected models.
Shinkfield, A. J., & Stufflebeam, D. L. (2012). Teacher evaluation: Guide to effective practice (Vol. 41). Springer Science & Business Media.
A critical review of reading programs requires objective and in-depth analysis. For these reasons, the authors offer the following recommendations and procedures for analyzing critical elements of programs.
Simmons, D. C., & Kame’enui, E. J. (2003). A consumer’s guide to evaluating a core reading program grades K-3: A critical elements analysis. Retrieved December, 19, 2006.
A multiple baseline across items design was used to evaluate the effects of a cover, copy, and compare (CCC) intervention on students' accuracy in identifying states on a map of the United States. The results showed the CCC intervention was effective in increasing the class mean accuracy levels in locating states.
Skinner, C. H., Belfiore, P. J., & Pierce, N. (1992). Cover, copy, and compare: Increasing geography accuracy in students with behavior disorders. School Psychology Review, 21(1), 73-81.
This annual publication provides a compilation of statistical information covering the broad field of education from prekindergarten through graduate school. It has been published annually since 1962, providing over 50 years of data with which to benchmark education performance at the system level in this country.
Snyder, T.D., de Brey, C., and Dillow, S.A. (2018). Digest of Education Statistics 2016 (NCES 2017-094). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC.
Observational measurement of treatment adherence has long been considered the gold standard. However, little is known about either the generalizability of the scores from extant observational instruments or the sampling needed. Results suggested that reliable cognitive–behavioral therapy adherence studies require at least 10 sessions per patient, assuming 12 patients per therapists and two coders—a challenging threshold even in well-funded research. Implications, including the importance of evaluating alternatives to observational measurement, are discussed.
Southam-Gerow, M. A., Bonifay, W., McLeod, B. D., Cox, J. R., Violante, S., Kendall, P. C., & Weisz, J. R. (2020). Generalizability and decision studies of a treatment adherence instrument. Assessment, 27(2), 321-333.
This paper presents the results of a rigorous experiment examining the impact of pay for performance on student achievement and instructional practice.
Springer, M. G., Ballou, D., Hamilton, L., Le, V. N., Lockwood, J. R., McCaffrey, D. F., ... & Stecher, B. M. (2011). Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching (POINT). Society for Research on Educational Effectiveness.
We report findings from a quasi-experimental evaluation of the recently implemented US$5,000 retention bonus program for effective teachers in Tennessee’s Priority Schools. We estimate the impact of the program on teacher retention using a fuzzy regression discontinuity design by exploiting a discontinuity in the probability of treatment conditional on the composite teacher effectiveness rating that assigns bonus eligibility.
Springer, M. G., Swain, W. A., & Rodriguez, L. A. (2016). Effective teacher retention bonuses: Evidence from Tennessee. Educational Evaluation and Policy Analysis, 38(2), 199-221.
Bill & Melinda Gates Foundation launched the Intensive Partnerships for Effective Teaching initiative. The initiative's goal is dramatic gains in student achievement, graduation rates, and college-going, especially for LIM students.
Stecher, B. M., Garet, M. S., Hamilton, L. S., Steiner, E. D., Robyn, A., Poirier, J., ... & de los Reyes, I. B. (2016). Improving Teaching Effectiveness: Implementation: The Intensive Partnerships for Effective Teaching Through 2013–2014. Rand Corporation.
This paper offers evidence that evaluation can shift the teacher effectiveness distribution through a different mechanism: by improving teacher skill, effort, or both in ways that persist long-run.
Taylor, E. S., & Tyler, J. H. (2012). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628-51.
In the research reported here, the authors study one approach to teacher evaluation: practice-based assessment that relies on multiple, highly structured classroom observations conducted by experienced peer teachers and administrators.
Taylor, E. S., & Tyler, J. H. (2012a). Can teacher evaluation improve teaching? Evidence of systematic growth in the effectiveness of mid-career teachers. Education Next, 12(4), 79–84. Retrieved from http://educationnext.org/can-teacher-evaluation-improve-teaching/
This report proposes six design standards that any rigorous and fair teacher evaluation system should meet. It offers a blueprint for better evaluations that can help every teacher succeed in the classroom—and give every student the best chance at success.
The New Teacher Project. (2010). Teacher Evaluation 2.0.New York, NY: Author.
The authors examine the causes and consequences of the status of teacher evaluation and its implications for the current national debate about performance pay for teachers. The report also examines a number of national, state, and local evaluation systems that offer potential alternatives to current practice.
Toch, T., & Rothman, R. (2008). Rush to Judgment: Teacher Evaluation in Public Education. Education Sector Reports. Education Sector.
Provides 2 hypothetical case examples that illustrate how single-case designs (alternating treatments, multiple baseline, and reversal) can be used to evaluate manipulable influences on school performance. In each case, an intervention plan is proposed for a student, and the success of the intervention is evaluated within a single-case design.
Wacker, D. P., Steege, M., & Berg, W. K. (1988). Use of single-case designs to evaluate manipulable influences on school performance. School Psychology Review.
This article presents a brief description of a manual called Getting to outcomes: methods and tools for planning, evaluation, and accountability (GTO) designed to assist practitioners in formulating the planning, implementation, and evaluation strategies for programs and policies.
Wandersman, A., Imm, P., Chinman, M., & Kaftarian, S. (2000). Getting to outcomes: A results-based approach to accountability. Evaluation and program planning, 23(3), 389-395.
In this article, we report the results from a randomized evaluation of the Safe and Civil Schools (SCS) model for school-wide positive behavioral interventions and supports. Thirty-two elementary schools in a large urban school district were randomly assigned to an initial training cohort or a wait-list control group.
Ward, B., & Gersten, R. (2013). A randomized evaluation of the safe and civil schools model for positive behavioral interventions and supports at elementary schools in a large urban school district. School Psychology Review, 42(3), 317-333.
This report examines the pervasive and longstanding failure to recognize and respond to variations in the effectiveness of teachers.
Weisberg, D., Sexton, S., Mulhern, J., Keeling, D., Schunck, J., Palcisco, A., & Morgan, K. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. New Teacher Project.
The purpose of this study was to assess the degree to which behavioral intervention studies conducted with persons with mental retardation operationally defined the independent variables and evaluated and reported measures of treatment integrity. The study expands the previous work in this area reported by Gresham, Gansle, and Noell (1993) and Wheeler, Baggett, Fox, and Blevins (2006) by providing an evaluation of empirical investigations published in multiple journals in the fields of applied behavior analysis and mental retardation from 1996–2006. Results of the review indicated that relatively few of the studies fully reported data on treatment integrity.
Wheeler, J. J., Mayton, M. R., Carter, S. L., Chitiyo, M., Menendez, A. L., & Huang, A. (2009). An assessment of treatment integrity in behavioral intervention studies conducted with persons with mental retardation. Education and Training in Developmental Disabilities, 187-195.
The current review sought to describe the implementation and evaluation of trauma-focused school practices as represented in the published literature. Through a systematic literature search, we identified 39 articles describing trauma-focused practices implemented in school settings with elementary populations and coded data regarding these interventions’ characteristics as well as their implementation and evaluation procedures.
Zakszeski, B. N., Ventresco, N. E., & Jaffe, A. R. (2017). Promoting resilience through trauma-focused practices: A critical review of school-based implementation. School mental health, 9(4), 310-321.