In this paper, the authors show that the questions we asked are fundamental and that our meta-analytic techniques are appropriate, robust, and statistically correct. In sum, the results and conclusions of our meta-analysis are not altered by our critics’ protests and accusations.
Research recognizes the power of assessment to amplify learning and skill acquisition. This overview describes and compares two types of Assessments educators rely on: Formative Assessment and Summative Assessment.
Discusses the relationship between academic performance or achievement and classroom environment factors. Classroom variables are conceptualized as an academic ecology that involves a network of relationships among student and environmental factors that affect the acquisition of new skills and student engagement in academic work.
This book is compiled from the proceedings of the sixth summit entitled “Performance Feedback: Using Data to Improve Educator Performance.” The 2011 summit topic was selected to help answer the following question: What basic practice has the potential for the greatest impact on changing the behavior of students, teachers, and school administrative personnel?
States, J., Keyworth, R. & Detrich, R. (2013). Introduction: Proceedings from the Wing Institute’s Sixth Annual Summit on Evidence-Based Education: Performance Feedback: Using Data to Improve Educator Performance. In Education at the Crossroads: The State of Teacher Preparation (Vol. 3, pp. ix-xii). Oakland, CA: The Wing Institute.
This review of the research on secondary reading programs focuses on 69 studies that used random assignment (n=62) or high-quality quasi-experiments (n=7) to evaluate outcomes of 51 programs on widely accepted measures of reading.The study found programs using one-to-one and small-group tutoring (+0.14 to +0.28 effect size), cooperative learning (+0.10 effect size), whole-school approaches including organizational reforms such as teacher teams (+0.06 effect size), and writing-focused approaches (+0.13 effect size) showed positive outcomes. Individual approaches in a few other categories also showed positive impacts. The findings are important suggesting interventions for secondary readers to improve struggling student’s chances of experiencing greater success in high school and better opportunities after graduation.
Citation: Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2018, January). A Synthesis of Quantitative Research on Reading Programs for Secondary Students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.
The main focus of this study is to find different kinds of variables that might contribute to variations in the strength and direction of the relationship by examining quantitative studies that relate mathematics teachers’ subject matter knowledge to student achievement in mathematics.
Ahn, S., & Choi, J. (2004). Teachers' Subject Matter Knowledge as a Teacher Qualification: A Synthesis of the Quantitative Literature on Students' Mathematics Achievement. Online Submission.
The purpose of the SWPBIS Tiered Fidelity Inventory is to provide a valid, reliable, and efficient measure of the extent to which school personnel are applying the core features of school-wide positive behavioral interventions and supports. The TFI is divided into three sections that can be used separately or in combination to assess the extent to which core features are in place.
Algozzine, B., Barrett, S., Eber, L., George, H., Horner, R., Lewis, T., Putnam, B., Swain-Bradway, J., McIntosh, K., & Sugai, G (2014). School-wide PBIS Tiered Fidelity Inventory. OSEP Technical Assistance Center on Positive Behavioral Interventions and Supports.
Multilevel modeling techniques were used with a sample of 643 students enrolled in 37 secondary school classrooms to predict future student achievement (controlling for baseline achievement) from observed teacher interactions with students in the classroom, coded using the Classroom Assessment Scoring System—Secondary.
Allen, J., Gregory, A., Mikami, A., Lun, J., Hamre, B., & Pianta, R. (2013). Observations of effective teacher–student interactions in secondary school classrooms: Predicting student achievement with the classroom assessment scoring system—secondary. School Psychology Review, 42(1), 76.
The authors effectively cover the construction of psychological tests and the interpretation of test scores and scales; critically examine classical true-score theory; and explain theoretical assumptions and modern measurement models, controversies, and developments.
Allen, M. J., & Yen, W. M. (2001). Introduction to measurement theory. Waveland Press.
The “Standards for Educational and Psychological Testing” were approved as APA policy by the APA Council of Representatives in August 2013.
American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
The purpose of this study is to assess whether academic achievement in fact increases after the introduction of high-stakes tests. The first objective of this study is to assess whether academic achievement has improved since the introduction of high-stakes testing policies in the 27 states with the highest stakes written into their grade 1-8 testing policies.
Amrein-Beardsley, A., & Berliner, D. C. (2002). The Impact of High-Stakes Tests on Student Academic Performance.
Functional behavior assessment is becoming a commonly used practice in school settings. Accompanying this growth has been an increase in research on functional behavior assessment. We reviewed the extant literature on documenting indirect and direct methods of functional behavior assessment in school settings.
Anderson, C. M., Rodriguez, B. J., & Campbell, A. (2015). Functional behavior assessment in schools: Current status and future directions. Journal of Behavioral Education, 24(3), 338-371.
The purposes of this article are to describe (a) the context in which PBS and FBA are needed and (b) definitions and features of PBS and FBA.
applying positive behavior support
This book was designed as an assessment of standardized testing and its alternatives at the secondary school level.
Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school.
This paper describe a comprehensive model for the application of behavior analysis in the school. The model includes descriptive assessment, functional analysis, functional behavioral assessment, schools, in-home, problematic behavior.
Asmus, J. M., Vollmer, T. R., & Borrero, J. C. (2002). Functional behavioral assessment: A school based model. Education & Treatment of Children, 25(1), 67.
The authors describe three forms of functional assessment used in applied behavior analysis and explain three potential reasons why OBM has not yet adopted the use of such techniques.
Austin, J., Carr, J. E., & Agnew, J. L. (1999). The need for assessment of maintaining variables in OBM. Journal of Organizational Behavior Management, 19(2), 59-87.
In current study, through a meta-analysis of 78 studies, it is aimed to determine the overall effect size for testing at different frequency levels and to find out other study characteristics, related to the effectiveness of frequent testing.
Başol, G., & Johanson, G. (2009). Effectiveness of frequent testing over achievement: A meta analysis study. Journal of Human Sciences, 6(2), 99-121.
In current study, through a meta-analysis of 78 studies, it is aimed to determine the overall effect size for testing at different frequency levels and to find out other study characteristics, related to the effectiveness of frequent testing.
Başol, G., & Johanson, G. (2009). Effectiveness of frequent testing over achievement: A meta analysis study. Journal of Human Sciences, 6(2), 99-121.
Feedback is an essential construct for many theories of learning and instruction, and an understanding of the conditions for effective feedback should facilitate both theoretical development and instructional practice.
Bangert-Drowns, R. L., Kulik, C. L. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of educational research, 61(2), 213-238.
The Early Start Denver Model (ESDM) has been gaining popularity as a comprehensive treatment model for children ages 12 to 60 months with autism spectrum disorders (ASD). This article evaluates the research on the ESDM through an analysis of study design and purpose; child participants; setting, intervention agents, and context; density and duration; and overall research rigor to assist professionals with knowledge translation and decisions round adoption of the practices.
Baril, E. M., & Humphreys, B. P. (2017). An evaluation of the research evidence on the Early Start Denver Model. Journal of Early Intervention, 39(4), 321-338.
This study provides a systematic review of the use of social emotional learning (SEL) interventions in urban schools over the last 20 years. I summarize the types of interventions used and the outcomes examined, and I describe the use of culturally responsive pedagogy as a part of each intervention.
Barnes, T. N. (2019). Changing the landscape of social emotional learning in urban schools: What are we currently focusing on and where do we go from here?. The Urban Review, 51(4), 599-637.
Professional judgment is required whenever conditions are uncertain. This article provides an analysis of professional judgment and describes sources of error in decision making.
Barnett, D. W. (1988). Professional judgment: A critical appraisal. School Psychology Review., 17(4), 658-672.
The authors describe minimal requirements for functional intervention-based assessment and suggest strategies for using these methods to analyze developmental delays and make special service eligibility decisions for preschool children (intervention-based multifactored evaluation or IBMFE).
Barnett, D. W., Bell, S. H., Gilkey, C. M., Lentz Jr, F. E., Graden, J. L., Stone, C. M., ... & Macmann, G. M. (1999). The promise of meaningful eligibility determination: Functional intervention-based multifactored preschool evaluation. The Journal of Special Education, 33(2), 112-124.
This chapter is intended to help solve questions concerning how to judge the appropriateness of assessment or measurement decisions and of the information used to make decisions.
Barnett, D. W., Lentz Jr, F. E., & Macmann, G. (2000). Psychometric qualities of professional practice.
Seeing Students Learn Science is a guidebook meant to help educators improve the way in which students learn science. The introduction of new science standards across the nation has led to the adoption of new curricula, instruction, and professional development to align with the new standards. This publication is designed as a resource for educators to adapt assessment to these changes. It includes examples of innovative assessment formats, ways to embed assessments in engaging classroom activities, and ideas for interpreting and using novel kinds of assessment information.
Beatty, A., Schweingruber, H., & National Academies of Sciences, Engineering, and Medicine. (2017). Seeing Students Learn Science: Integrating Assessment and Instruction in the Classroom. Washington, DC: National Academies Press.
Just as an athlete needs effective practice to be able to compete at high levels of performance, students benefit from formative practice and feedback to master skills and content in a course. At the most complex and challenging end of the spectrum of summative assessment techniques, the portfolio involves a collection of artifacts of student learning organized around a particular learning outcome.
Beers, M. J. (2020). Playing like you practice: Formative and summative techniques to assess student learning. High impact teaching for sport and exercise psychology educators, 92-102.
This paper uses student-level data from a statewide community college system to examine the validity of placement tests and high school information in predicting course grades and college performance.
Belfield, C. R., & Crosta, P. M. (2012). Predicting Success in College: The Importance of Placement Tests and High School Transcripts. CCRC Working Paper No. 42. Community College Research Center, Columbia University.
This paper outline the rationale, critical dimensions, and techniques for using peer micronorms and discuss technical adequacy considerations.
Bell, S. H., & Barnett, D. W. (1999). Peer micronorms in the assessment of young children: Methodological review and examples. Topics in Early Childhood Special Education, 19(2), 112-122.
Since the passage of the No Child Left Behind Act (NCLB) in 2002 and its 2015 update, the Every Student Succeeds Act (ESSA), every third through eighth grader in U.S. public schools now takes tests calibrated to state standards, with the aggregate results made public. In a study of the nation’s largest urban school districts, students took an average of 112 standardized tests between pre-K and grade 12.
Berwick, C. (2019). What Does the Research Say About Testing? Marin County, CA: Edutopia.
Assessing literature that uses either experimental (lottery) or student-level growth-based methods, this analysis infers the causal impact of attending a charter school on student performance.
Betts, J. R., & Tang, Y. E. (2019). The effect of charter schools on student achievement. School choice at the crossroads: Research perspectives, 67-89.
This article reports on a 4-year longitudinal study of the effects of Literacy Collaborative (LC), a schoolwide reform model that relies primarily on the oneon-one coaching of teachers as a lever for improving student literacy learning.
Biancarosa, G., Bryk, A. S., & Dexter, E. R. (2010). Assessing the value-added effects of literacy collaborative professional development on student learning. The elementary school journal, 111(1), 7-34.
This paper is a review of the literature on classroom formative assessment.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in education, 5(1), 7-74.
This is a review of the literature on classroom formative assessment. Several studies show firm evidence that innovations designed to strengthen the frequent feedback that students receive about their learning yield substantial learning gains.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: principles, policy & practice, 5(1), 7-74.
Firm evidence shows that formative assessment is an essential component of classroom work and that its development can raise standards of achievement, Mr. Black and Mr. Wiliam point out. Indeed, they know of no other way of raising standards for which such a strong prima facie case can be made.
Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81-90.
This paper theorizes that variations in learning and the level of learning of students are determined by the students' learning histories and the quality of instruction they receive.
Bloom, B. (1976). Human characteristics and school learning. New York: McGraw-Hill.
Research and experience tell us very forcefully about the importance of assessment in higher education. It shapes the experience of students and influences their behaviour more than the teaching they receive. The influence of assessment means that ‘there is more leverage to improve teaching through changing assessment than there is in changing anything else’.
Bloxham, S., & Boyd, P. (2007). Developing Effective Assessment in Higher Education: a practical guide.
This NCES study explores public schools' demographic composition, in particular, the proportion of Black students enrolled in schools (also referred to "Black student density" in schools) and its relation to the Black-White achievement gap. This study, the first of it's kind, used the 2011 NAEP grade 8 mathematics assessment data. Among the results highlighted in the report, the study indicates that the achievement gap between Black and White students remains whether schools fall in the highest density category or the lowest density category.
Bohrnstedt, G., Kitmitto, S., Ogut, B., Sherman, D., and Chan, D. (2015). School Composition and the Black–White Achievement Gap (NCES 2015-018). U.S. Department of Education, Washington, DC: National Center for Education Statistics. Retrieved from http://nces.ed.gov/pubsearch.
This fourth edition provides in-depth treatments of critical measurement topics, and the chapter authors are acknowledged experts in their respective fields.
Brennan, R. L. (Ed.) (2006). Educational measurement (4th ed.). Westport, CT: Praeger Publishers.
In recent years the number of states that have adopted or plan to implement end of course (EOC) tests as part of their high school assessment program has grown rapidly. While EOC tests certainly offer great promise, they are not without challenges. Many of the proposed uses of EOC tests open new and often complex issues related to design and implementation. The purpose of this brief is to support education leaders and policy makers in making appropriate technical and operational decisions to maximize the benefit of EOC tests and address the challenges.
Brief, A. P. (2011). State End-of-Course Testing Programs.
Cameron, J., & Pierce, W. D. (1996). The debate about rewards and intrinsic motivation: Protests and accusations do not alter the results. Review of Educational Research, 66(1), 39–51.
This report tells policymakers what metrics they must track in order to make the best decisions regarding the supply and training of school leaders.
Campbell, C., & Gross, B. (2012). Principal Concerns: Leadership Data and Strategies for States. Center on Reinventing Public Education.
The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. They compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers.
Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National board certification and teacher effectiveness: Evidence from a random assignment experiment (No. w14608). National Bureau of Economic Research.
This study developed a zero-to-five index of the strength of accountability in 50 states based on the use of high-stakes testing to sanction and reward schools, and analyzed whether that index is related to student gains on the NAEP mathematics test in 1996–2000.
Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305-331.
This paper discusses the search for a “magic metric” in education: an index/number that would be generally accepted as the most efficient descriptor of school’s performance in a district.
Celio, M. B. (2013). Seeking the Magic Metric: Using Evidence to Identify and Track School System Quality. In Performance Feedback: Using Data to Improve Educator Performance (Vol. 3, pp. 97-118). Oakland, CA: The Wing Institute.
This report provides a practical “management guide,” for an evidence-based key indicator data decision system for school districts and schools.
Celio, M. B., & Harvey, J. (2005). Buried Treasure: Developing A Management Guide From Mountains of School Data. Center on Reinventing Public Education.
A meta-analysis of the distributed practice effect was performed to illuminate the effects of temporal variables that have been neglected in previous reviews. This review found 839 assessments of distributed practice in 317 experiments located in 184 articles.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological bulletin, 132(3), 354.
Few issues engender stronger opinions in the American population than education, and the number and complexity of issues continue to grow. The annual Education Next Survey of Public Opinion examines the opinions of parents and teachers across a wide range of topic areas such as: student performance, common core curriculum, charter schools, school choice, teacher salaries, school spending, school reform, etc. The 12thAnnual Survey was completed in May, 2018.
Cheng, A., Henderson, M. B., Peterson, P.E. & West, M. R. (2019). The 2018 EdNext poll on school reform. Education Next, 19(1).
The purpose of this study is to estimate the extent to which publication bias is present in education and special education journals. This paper shows that published studies were associated with significantly larger effect sizes than unpublished studies (d=0.64). The authors suggest that meta-analyses report effect sizes of published and unpublished separately in order to address issues of publication bias.
Chow, J. C., & Ekholm, E. (2018). Do Published Studies Yield Larger Effect Sizes than Unpublished Studies in Education and Special Education? A Meta-review.
A rationale and model for changing assessment efforts in schools from simple description to the integration of information from multiple sources for the purpose of designing interventions are described.
Christenson, S. L., & Ysseldyke, J. E. (1989). Scientific practitioner: Assessing student performance: An important change is needed. Journal of School Psychology, 27(4), 409-425.
In the past, the accreditation process put a premium on compliance and included some reporting requirements that did not necessarily lead programs toward excellence or increase teacher candidates' impact on schools and learning. The accreditation process enabled institutions to think they were "done" once they earned accredited status, and did not do enough to encourage programs to tackle complex challenges confronting P-12 schools.
Cibulka, J. G. (2009). Improving relevance, evidence, and performance in teacher preparation. The Education Digest, 75(2), 44.
This Section of reports aim to assess the extent to which reports of RTCs published in 5 general medical journal have discussed new results in light of all of available evidence.
Clarke, M., & Chalmers, I. (1998). Discussion sections in reports of controlled trials published in general medical journals: islands in search of continents?. Jama, 280(3), 280-282.
Reviews of treatment outcome literature indicate treatment integrity is not regularly assessed. In consultation, two levels of treatment integrity (i.e., consultant procedural integrity [CPI] and intervention treatment integrity [ITI]) provide relevant implementation data.
Collier-Meek, M. A., & Sanetti, L. M. (2014). Assessment of consultation and intervention implementation: A review of conjoint behavioral consultation studies. Journal of Educational and Psychological Consultation, 24(1), 55-73.
Treatment fidelity assessment is critical to evaluating the extent to which interventions, such as the Good Behavior Game, are implemented as intended and impact student outcomes. The assessment methods by which treatment fidelity data are collected vary, with direct observation being the most popular and widely recommended.
Collier-Meek, M. A., Fallon, L. M., & DeFouw, E. R. (2020). Assessing the implementation of the Good Behavior Game: Comparing estimates of adherence, quality, and exposure. Assessment for Effective Intervention, 45(2), 95-109.
The purpose of this study was to critically examine the positive approaches to behavioral intervention research and young children demonstrating challenging behavior. The results indicate an increasing trend of research using positive behavioral interventions with young children who demonstrate challenging behaviors. Most of the research has been conducted with children with disabilities between 3 and 6 years old.
Conroy, M. A., Dunlap, G., Clarke, S., & Alter, P. J. (2005). A descriptive analysis of positive behavioral intervention research with young children with challenging behavior. Topics in Early Childhood Special Education, 25(3), 157-166.
We study the effectiveness of teachers certified by the National Board for Professional Teaching Standards (NBPTS) in Washington State, which has one of the largest populations of National Board-Certified Teachers (NBCTs) in the nation. Certification effects vary by subject, grade level, and certification type, with greater effects for middle school math certificates. We find mixed evidence that teachers who pass the assessment are more effective than those who fail, but that the underlying NBPTS assessment score predicts student achievement.
Cowan, J., & Goldhaber, D. (2016). National board certification and teacher effectiveness: Evidence from Washington State. Journal of Research on Educational Effectiveness, 9(3), 233-258.
We know that students should participate constructively in the classroom. In fact, most of us probably agree that a significant portion of a student’s grade should come from his or her participation. However, like many teachers, you may find it difficult to explain to students how you assess their participation.
Craven, J. A., & Hogan, T. (2001). Assessing participation in the classroom. Science Scope, 25(1), 36.
One flashpoint in the incendiary debate over standardized testing in American public
schools is the area of test preparation. The focus of this chapter is test preparation in achievement testing and it's purportedly harmful effects on students and teachers.
Crocker, L. (2005). Teaching for the test: How and why test preparation is appropriate. Defending standardized testing, 159-174.
This paper examines critical issues that must be considered to maximize the positive impact of big data and minimize negative effects that are currently encountered in other domains. This review is designed to raise awareness of these issues with particular attention paid to implications for educational research design in order that educators can develop the necessary policies and practices to address this complex phenomenon and its possible implications in the field of education.
Daniel, B. K. (2017). Big Data and data science: A critical review of issues for educational research. British Journal of Educational Technology.
The Performance Assessment for California Teachers (PACT) is an authentic tool for evaluating prospective teachers by examining their abilities to plan, teach, assess, and reflect on instruction in actual classroom practice. The PACT seeks both to measure and develop teacher effectiveness, and this study of its predictive and consequential validity provides information on how well it achieves these goals.
Darling-Hammond, L., Newton, S. P., & Wei, R. C. (2013). Developing and assessing beginning teacher effectiveness: The potential of performance assessments. Educational Assessment, Evaluation and Accountability, 25(3), 179-204.
This paper describes how teacher learning through involvement with student-performance assessments has been accomplished in the United States and around the world, particularly in countries that have been recognized for their high-performing educational systems
This paper examines factors that lead to bias as well offers specific recommendations to journals, funders, ethics committees, and universities designed to reduce reporting bias.
Dawson, P., & Dawson, S. L. (2018). Sharing successes and hiding failures:‘reporting bias’ in learning and teaching research. Studies in Higher Education, 43(8), 1405-1416.
To produce better outcomes for students two things are necessary: (1) effective, scientifically supported interventions (2) those interventions implemented with high integrity. Typically, much greater attention has been given to identifying effective practices. This review focuses on features of high quality implementation.
Detrich, R. (2014). Treatment integrity: Fundamental to education reform. Journal of Cognitive Education and Psychology, 13(2), 258-271.
Demonstrates the use of functional assessment procedures to identify the appropriate use of the Picture Exchange Communication system as part of a behavior support plan to resolve serious problems exhibited by a 3 yr old boy with autism and pervasive developmental disorder.
Dooley, P., Wilczenski, F. L., & Torem, C. (2001). Using an activity schedule to smooth school transitions. Journal of Positive Behavior Interventions, 3(1), 57-61.
Featuring step-by-step guidance, examples, and forms, this guide to functional assessment procedures provides a first step toward designing positive and educative programs to eliminate serious behavior problems.
EDITION, N. T. T. (2015). Functional assessment and program development for problem behavior: a practical handbook.
This meta-analysis systematically synthesized results from 26 component studies, including dissertations and published articles, which reported at least one correlation between collective teacher efficacy and school achievement.
Eells, R. J. (2011). Meta-analysis of the relationship between collective teacher efficacy and student achievement.
In this paper we take up the question of model choice and examine three competing approaches. The first approach, (SGPs) framework, eschews all controls for student covariates and schooling environments. The second approach, value-added models (VAMs), controls for student background characteristics and under some conditions can be used to identify the causal effects of schools and teachers. The third approach, also VAM-based, fully levels the playing field so that the correlation between school- and teacher-level growth measures and student demographics is essentially zero. We argue that the third approach is the most desirable for use in educational evaluation systems.
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013). Selecting growth measures for school and teacher evaluations: Should proportionality matter?. National Center for Analysis of Longitudinal Data in Education Research, 21.
Examines the fundamental characteristics of and reviews empirical research on performance assessment of diverse groups of students, including those with mild disabilities. Discussion of the technical qualities of performance assessment and barriers to its advancement leads to the conclusion that performance assessment should play a supplementary role in the evaluation of students with significant learning problems
Elliott, S. N. (1998). Performance Assessment of Students' Achievement: Research and Practice. Learning Disabilities Research and Practice, 13(4), 233-41.
Curriculum-based measurement and performance assessments can provide valuable data for making special-education eligibility decisions. Reviews applied research on these assessment approaches and discusses the practical context of treatment validation and decisions about instructional services for students with diverse academic needs.
Elliott, S. N., & Fuchs, L. S. (1997). The Utility of Curriculum-Based Measurement and Performance Assessment as Alternatives to Traditional Intelligence and Achievement Tests. School Psychology Review, 26(2), 224-33.
The theory that measuring performance and coupling it to rewards and sanctions will cause schools and the individuals who work in them to perform at higher levels underpins performance based accountability systems. Such systems are now operating in most states and in thousands of districts, and they represent a significant change from traditional approaches to accountability.
Elmore, R. F., & Fuhrman, S. H. (2001). Holding schools accountable: Is it working?. Phi Delta Kappan, 83(1), 67-72.
This article provides a wide range of information for 100 articles published from January 1980 through July 1999 that describe the functional assessment (FA) of behavior in school settings.
Ervin, R. A., Radford, P. M., Bertsch, K., & Piper, A. L. (2001). A descriptive analysis and critique of the empirical literature on school-based functional assessment. School Psychology Review, 30(2), 193.
A disproportionate reliance on SAT scores in college admissions has generated a growing number and volume of complaints. Some applicants, especially members of underrepresented minority groups, believe that the test is culturally biased. Other critics argue that high school GPA and results on SAT subject tests are better than scores on the SAT reasoning test at predicting college success, as measured by grades in college and college graduation.
Espenshade, T. J., & Chung, C. Y. (2010). Standardized admission tests, college performance, and campus diversity. Office of Population Research, Princeton University.
High-quality assessments are essential to effectively educating students, measuring progress, and promoting equity. Done well and thoughtfully, they provide critical information for educators, families, the public, and students themselves and create the basis for improving outcomes for all learners. Done poorly, in excess, or without clear purpose, however, they take valuable time away from teaching and learning, and may drain creative approaches from our classrooms.
Every Student Succeeds Act. (2017). Assessments under Title I, Part A & Title I, Part B: Summary of final regulations
This is a comprehensive literature review of the topic of Implementation examining all stages beginning with adoption and ending with sustainability.
Fixsen, D. L., Naoom, S. F., Blase, K. A., & Friedman, R. M. (2005). Implementation research: A synthesis of the literature.
The reliability and validity of 4 approaches to the assessment of children and adolescents with learning disabilities (LD) are reviewed. The authors identify serious psychometric problems that affect the reliability of models based on aptitude-achievement discrepancies and low achievement.
Fletcher, J. M., Francis, D. J., Morris, R. D., & Lyon, G. R. (2005). Evidence-based assessment of learning disabilities in children and adolescents. Journal of Clinical Child and Adolescent Psychology, 34(3), 506-522.
The authors replicated K. E. Stanovich and L. Siegel (regression-based logic to the reading-level-match design approach, but contrasted it with statistical matches using W scores, which are Rasch-scaled decoding scores based on a common metric regardless of age or grade.
Foorman, B. R., Francis, D. J., Fletcher, J. M., & Lynn, A. (1996). Relation of phonological and orthographic processing to early reading: Comparing two approaches to regression-based, reading-level-match designs. Journal of Educational Psychology, 88(4), 639.
CASL's general goal is to identify instructional practices that accelerate the learning of K-3 children with disabilities. A specific goal is to identify and understand the nature of nonresponsiveness to generally effective instruction.
Fuchs, D., & Fuchs, L. S. (2005). Responsiveness-to-intervention: A blueprint for practitioners, policymakers, and parents. Teaching Exceptional Children, 38(1), 57-61.
This digest summarizes principles of performance assessment, which connects classroom assessment to learning. Specific ways that assessment can enhance instruction are outlined, as are criteria that assessments should meet in order to inform instructional decisions. Performance assessment is compared to behavioral assessment, mastery learning, and curriculum-based management.
Fuchs, L. S. (1995). Connecting Performance Assessment to Instruction: A Comparison of Behavioral Assessment, Mastery Learning, Curriculum-Based Measurement, and Performance Assessment. ERIC Digest E530.
In this article, the author uses examples in the literature to explore conceptual and technical issues associated with options for specifying three assessment components.
Fuchs, L. S. (2003). Assessing intervention responsiveness: Conceptual and technical issues. Learning Disabilities Research & Practice, 18(3), 172-186.
This special issue of Learning Disabilities Research and Practice is dedicated to Dr. Stanley L. Deno. For most of his career, Stan was a Professor of Educational Psychology at the University of Minnesota, where in the mid-1970s, he initiated a systematic program of research on what eventually came to be known as curriculum-based measurement (CBM) Through CBM, Stan’s impact on the field of special education and beyond has been enormous.
Fuchs, L. S. (2017). Curriculum‐based measurement as the emerging alternative: Three decades later. Learning Disabilities Research & Practice.
In this meta-analysis of studies that utilize formative assessment the authors report an effective size of .7.
Fuchs, L. S., & Fuchs, D. (1986). Effects of Systematic Formative Evaluation: A Meta-Analysis. Exceptional Children, 53(3), 199-208.
This commentary offers such a framework and then they consider how the articles constituting this special issue address the various questions posed in the framework.
Fuchs, L. S., & Fuchs, D. (2006). A framework for building capacity for responsiveness to intervention. School Psychology Review, 35(4), 621.
This study examined the educational effects of repeated curriculumbased measurement and evaluation. Thirty-nine special educators, each having three to four pupils in the study, were assigned randomly to a repeated curriculum-based measurement/evaluation (experimental) treatment or a conventional special education evaluation (contrast) treatment
Fuchs, L. S., Deno, S. L., & Mirkin, P. K. (1984). The effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21(2), 449-460.
This article reviews an inductive assessment model for building instructional programs that satisfy the requirement that satisfy the requirement that special education be planned to address an individual student's need.
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1994). Strengthening the connection between assessment and instructional planning with expert systems. Exceptional Children, 61(2), 138.
The purpose of this study was to examine students' weekly rates of academic growth, or slopes of achievement, when Curriculum-Based Measurement (CBM) is conducted repeatedly over 1 year.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Walz, L., & Germann, G. (1993). Formative evaluation of academic progress: How much growth can we expect?. School Psychology Review, 22, 27-27.
The purpose of this study was to examine effects of classroom-basedperformance-assessment (PA)-driven instruction.
Fuchs, L. S., Fuchs, D., Karns, K., Hamlett, C. L., & Katzaroff, M. (1999). Mathematics performance assessment in the classroom: Effects on teacher planning and student problem solving. American educational research journal, 36(3), 609-646.
The purposes of this study were to examine how well 3 measures, representing 3 points on a traditional-alternative mathematics assessment continuum, interrelated and discriminated students achieving above, at, and below grade level and to explore effects of cooperative testing for the most innovative measure (performance assessment).
Fuchs, L. S., Fuchs, D., Karns, K., Hamlett, C., Katzaroff, M., & Dutka, S. (1998). Comparisons among individual and cooperative performance assessments and other measures of mathematics competence. The Elementary School Journal, 99(1), 23-51.
The entrenchment of standardized assessment in America's schools reflects its emergence from the dual traditions of democratic school reform and scientific measurement. Within distinct sociohistorical contexts, ambitious testing pioneers persuaded educators and policymakers to embrace the standardized testing movement.
Gallagher, C. J. (2003). Reconciling a tradition of testing with a new learning paradigm. Educational Psychology Review, 15(1), 83-99.
Formative assessment (FA) is considered a powerful tool to enhance learning. However, there have been few studies addressing how the implementation of FA influences the development of inquiry skills so far. This research intends to determine the efficacy of teaching using FA in the development of students' inquiry skills.
Ganajová, M., Sotakova, I., Lukáč, S., Ješková, Z., Jurkova, V., & Orosová, R. (2021). Formative Assessment As A Tool To Enhance The Development Of Inquiry Skills In Science Education. Journal of Baltic Science Education, 20(2), 204.
This chapter describes some of the critical conceptual issues related to intervention implementation, and provides a selected review of the research regarding the assessment and assurance of intervention implementation.
Gansle, K. A., & Noell, G. H. (2007). The fundamental role of intervention implementation in assessing response to intervention. In Handbook of response to intervention (pp. 244-251). Springer, Boston, MA.
As a classroom teacher or administrator, how do you ensure that the information shared in a student-led conference provides a balanced picture of the student's strengths and weaknesses? The answer to this is to balance both summative and formative classroom assessment practices and information gathering about student learning.
Garrison, C., & Ehringhaus, M. (2007). Formative and summative assessments in the classroom.
In the United States, nationally representative data on student achievement come primarily from two sources: the National Assessment of Educational Progress (NAEP)—also known as “The Nation’s Report Card”—and U.S. participation in international assessments, including the Program for International Student Assessment (PISA). In Summary, this study this study found many similarities between the two assessments. However, it also found important differences in the relative emphasis across content areas or categories, in the
role of context, in the level of complexity, in the degree of mathematizing, in the overall amount
of text, and in the use of representations in assessments
Gattis, K., Kim, Y. Y., Stephens, M., Hall, L.D., Liu, F., Holmes, J. A Comparison Study of the Program for International Student Assessment (PISA) 2012 and the National Assessment of Educational Progress (NAEP) 2013 Mathematics Assessments. (2016). American Institute for Research. Retrieved from https://www.air.org/sites/default/files/downloads/report/Comparison-NAEP-PISA-Mathematics-May-2016.pdf
High-school grades are often viewed as an unreliable criterion for college admissions, owing to differences in grading standards across high schools, while standardized tests are seen as methodologically rigorous, providing a more uniform and valid yardstick for assessing student ability and achievement. The present study challenges that conventional view. The study finds that high-school grade point average (HSGPA) is consistently the best predictor not only of freshman grades in college, the outcome indicator most often employed in predictive-validity studies, but of four-year college outcomes as well.
Geiser, S., & Santelices, M. V. (2007). Validity of High-School Grades in Predicting Student Success beyond the Freshman Year: High-School Record vs. Standardized Tests as Indicators of Four-Year College Outcomes. Research & Occasional Paper Series: CSHE. 6.07. Center for studies in higher education.
This article presents quality indicators for experimental and quasi-experimental studies for special education.
Gersten, R., Fuchs, L. S., Compton, D., Coyne, M., Greenwood, C., & Innocenti, M. S. (2005). Quality indicators for group experimental and quasi-experimental research in special education. Exceptional children, 71(2), 149-164.
In one three-week period, a pandemic has completely changed the national landscape on assessment.
Gewertz, C. (2020). It’s official: All states have been excused from statewide testing this year. Education Week.
This article focuses on the evaluation of assessment arrangements and the way they affect student learning out of class. It is assumed that assessment has an overwhelming influence on what, how and how much students study.
Gibbs, G., & Simpson, C. (2005). Conditions under which assessment supports students’ learning. Learning and teaching in higher education, (1), 3-31.
This meta-analysis examines 23 studies for student access to curriculum by assessing the gap in reading achievement between general education peers and students with disabilities (SWD). The study finds that SWDs performed more than three years below peers. The study looks at the implications for changing this pictures and why current policies and practices are not achieving the desired results.
Gilmour, A. F., Fuchs, D., & Wehby, J. H. (2018). Are students with disabilities accessing the curriculum? A meta-analysis of the reading achievement gap between students with and without disabilities. Exceptional Children. Advanced online publication. doi:10.1177/0014402918795830
To evaluate the impact of comprehensive teacher induction relative to the usual induction support, the authors conducted a randomized experiment in a set of districts that were not already implementing comprehensive induction.
Glazerman, S., Isenberg, E., Dolfin, S., Bleeker, M., Johnson, A., Grider, M., & Jacobus, M. (2010). Impacts of Comprehensive Teacher Induction: Final Results from a Randomized Controlled Study. NCEE 2010-4027. National Center for Education Evaluation and Regional Assistance.
Where, when, and how we teach reading says more about us than it does the students. Explicitly teaching and leveraging the science allows us to overcome our blind spots, assumptions, and biases which impact every aspect of instruction. In doing so, we save us from ourselves and avoid a permanent underclass filling our correctional institutions as prisoners of the ‘reading wars.’
Goldenberg, C., Glaser, D. R., Kame'enui, E. J., Butler, K., Diamond, L., Moats, L., ... & Grimes, S. C. (2020). The Four Pillars to Reading Success: An Action Guide for States. National Council on Teacher Quality.
In this article, we examine assessment and accountability in the context of a prevention-oriented assessment and intervention system designed to assess early reading progress formatively.
Good III, R. H., Simmons, D. C., & Kame'enui, E. J. (2001). The importance and decision-making utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific studies of reading, 5(3), 257-288.
A significant and eye-opening examination of the current state of the testing movement in the
United States, where more than 150 million standardized intelligence, aptitude, and
achievement tests are administered annually by schools, colleges, business and industrial
firms, government agencies, and the military services.
Goslin, D. A. (1963). The search for ability: Standardized testing in social perspective (Vol. 1). Russell Sage Foundation.
Discusses the uses and abuses of intelligence testing in our educational systems. Dr. Goslin
examines teachers' opinions and practices with regard to tests and finds considerable
discrepancies between attitude and behavior.
Goslin, D. A. (1967). Teachers and testing. Russell Sage Foundation.
In order to meet writing objectives specified in the Common Core State Standards (CCSS), many teachers need to make significant changes in how writing is taught. While CCSS identified what students need to master, it did not provide guidance on how teachers are to meet these writing benchmarks.
Graham, S., Harris, K. R., & Santangelo, T. (2015). based writing practices and the common core: Meta-analysis and meta-synthesis. The Elementary School Journal, 115(4), 498-522.
This report provides information on the preparation provided to teacher candidates from
teacher training programs so that they can fully use assessment data to improve classroom
Greenberg, J., & Walsh, K. (2012). What Teacher Preparation Programs Teach about K-12 Assessment: A Review. National Council on Teacher Quality.
We strived to apply the standards uniformly to all the nation’s teacher preparation programs as part of our effort to bring as much transparency as possible to the way America’s teachers are prepared. In collecting information for this initial report, however, we encountered enormous resistance from leaders of many of the programs we sought to assess.
Greenberg, J., McKee, A., & Walsh, K. (2013). Teacher prep review: A review of the nation's teacher preparation programs. Available at SSRN 2353894.
In schools throughout the country, it is testing season--time for students to take the Big Standardized Test (the PARCC, SBA, or your state's alternative). This ritual really blossomed way back in the days of No Child Left Behind, but after all these years, teachers are mostly unexcited about it. There are many problems with the testing regimen, but a big issue for classroom teachers is that the tests do not help the teacher do her job.
Greene, P. (2019, April 24). Why the big standardized test is useless for teachers. Forbes
When schools pushed the pandemic pause button last spring, one of the casualties was the annual ritual of taking the Big Standardized Test. There were many reasons to skip the test, but in the end, students simply weren’t in school during the usual testing time
Greene, P. (2020, August 14). Schools should scrap the big standardized test this year. Forbes
This paper describe a few promising assessment technologies tat allow us to capture more direct, repeated, and contextually based measures of student learning, and propose an improvement-oriented approach to teaching and learning.
Greenwood, C. R., & Maheady, L. (1997). Measurable change in student performance: Forgotten standard in teacher preparation?. Teacher Education and Special Education, 20(3), 265-275.
Technical issues (specification of treatment components, deviations from treatment protocols and amount of behavior change, and psychometric issues in assessing Treatment Integrity) involved in the measurement of Treatment Integrity are discussed.
Gresham, F. M. (1989). Assessment of treatment integrity in school consultation and prereferral intervention. School Psychology Review, 18(1), 37-50.
Teacher effectiveness varies substantially, yet principals’ evaluations of teachers often fail to differentiate performance among teachers. We offer new evidence on principals’ subjective evaluations of their teachers’ effectiveness using two sources of data from a large, urban district: principals’ high-stakes personnel evaluations of teachers, and their low-stakes assessments of a subsample of those teachers provided to the researchers.
Grissom, J. A., & Loeb, S. (2017). Assessing principals’ assessments: Subjective evaluations of teacher effectiveness in low-and high-stakes environments. Education Finance and Policy, 12(3), 369-395.
The purpose of the current study was to test theoretically derived hypotheses regarding the relationships between team efficacy, potency, and performance and to examine the moderating effects of level of analysis and interdependence on observed relationships.
Gully, S. M., Incalcaterra, K. A., Joshi, A., & Beaubien, J. M. (2002). A meta-analysis of team-efficacy, potency, and performance: interdependence and level of analysis as moderators of observed relationships. Journal of applied psychology, 87(5), 819.
This article examine the concept of behavioral contingency and describes NCLB as a set of contingencies to promote the use of effective educational practices. Then they describe their design of an alternate assessment, including the components designed to capitalize on the contingencies of NCLB to promote positive educational outcomes.
Hager, K. D., Slocum, T. A., & Detrich, R. (2007). No Child Left Behind, Contingencies, and Utah’s Alternate Assessment. JEBPS Vol 8-N1, 63.
This study evaluated the differences in estimates of treatment integrity be measuring different dimensions of it.
Hagermoser Sanetti, L. M., & Fallon, L. M. (2011). Treatment Integrity Assessment: How Estimates of Adherence, Quality, and Exposure Influence Interpretation of Implementation. Journal of Educational & Psychological Consultation, 21(3), 209-232.
This quantitative review examines 20 studies to establish an effect size of .71 for the impact of “metacognitive” instruction on reading comprehension.
Haller, E. P., Child, D. A., & Walberg, H. J. (1988). Can comprehension be taught? A quantitative synthesis of “metacognitive” studies. Educational researcher, 17(9), 5-8.
Test-based accountability systems that attach high stakes to standardized test results have
raised a number of issues on educational assessment and accountability. Do these high-
stakes tests measure student achievement accurately? How can policymakers and
educators attach the right consequences to the results of these tests? And what kinds of
tradeoffs do these testing policies introduce?
Hamilton, L. S., Stecher, B. M., & Klein, S. P. (2002). Making sense of test-based accountability in education. Rand Corporation.
The every student succeeds act (ESSA), passed into law in 2015, explicitly prohibits the federal government from creating incentives to set national standards. The law represents a major departure from recent federal initiatives, such as Race to the Top, which beginning in 2009 encouraged the adoption of uniform content standards and expectations for performance.
Hamlin, D., & Peterson, P. E. (2018). Have states maintained high expectations for student performance? An analysis of 2017 state proficiency standards. Education Next, 18(4), 42-49.
The number of college graduates has steadily increased in the past decade, especially among those earning bachelor’s degrees. Graduation rates have also increased overall, especially at public, 4-year institutions; college graduation statistics suggest greater student success at these institutions.
Hanson, M. (2021). College Graduation Statistics. Educationdata.org.
The authors analysis of special education placement rates, a frequently identified area of concern, does not show any responsiveness to the introduction of accountability systems.
Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved student performance?. Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management, 24(2), 297-327.
This new research addresses a number of critical questions: Are a teacher’s cognitive skills a good predictor of teacher quality? This study examines the student achievement of 36 developed countries in the context of teacher cognitive skills. This study finds substantial differences in teacher cognitive skills across countries that are strongly related to student performance.
Hanushek, E. A., Piopiunik, M., & Wiederhold, S. (2014). The value of smarter teachers: International evidence on teacher cognitive skills and student performance (No. w20727). National Bureau of Economic Research.
Seven parents conducted assessments in an outpatient clinic using a prescribed hierarchy of antecedent and consequence treatment components for their children's problem behavior. Brief assessment of potential treatment components was conducted to identify variables that controlled the children's appropriate behavior.
Harding, J., Wacker, D. P., Cooper, L. J., Millard, T., & Jensen‐Kovalan, P. (1994). Brief hierarchical assessment of potential treatment components with children in an outpatient clinic. Journal of Applied Behavior Analysis, 27(2), 291-300.
The central argument of this paper is that the formative and summative purposes of assessment have become confused in practice and that as a consequence assessment fails to have a truly formative role in learning.
Harlen, W., & James, M. (1997). Assessment and learning: differences and relationships between formative and summative assessment. Assessment in Education: Principles, Policy & Practice, 4(3), 365-379.
This book provides a complete guide to implementing a wide range of problem-solving assessment methods: functional behavioral assessment, interviews, classroom observations, curriculum-based measurement, rating scales, and cognitive instruments.
Harrison, P. L. (2012). Assessment for intervention: A problem-solving approach. Guilford Press.
this report aims to provide the public, along with teachers and leaders in the Great City Schools, with objective evidence about the extent of standardized testing in public schools and how these assessments are used.
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student Testing in America's Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools.
Testing in the nation's schools is among the most debated issues in public education today.
Much of this discussion has centered on how much we are testing students and how we use
test results to evaluate teachers, inform instructional practice, and hold schools and
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student Testing in America's Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools.
Hattie’s book is designed as a meta-meta-study that collects, compares and analyses the findings of many previous studies in education. Hattie focuses on schools in the English-speaking world but most aspects of the underlying story should be transferable to other countries and school systems as well. Visible Learning is nothing less than a synthesis of more than 50.000 studies covering more than 80 million pupils. Hattie uses the statistical measure effect size to compare the impact of many influences on students’ achievement, e.g. class size, holidays, feedback, and learning strategies.
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York, NY: Routledge.
An ABAB design was used to evaluate the effectiveness of an interdependent group contingency with randomized components to improve the transition behavior of middle school students identified with emotional and behavioral disorders (EBDs) served in an alternative educational setting. The intervention was implemented by one teacher with three classes of students, and the dependent variable was the percentage of students ready to begin class at the appropriate time.
Hawkins, R. O., Haydon, T., McCoy, D., & Howard, A. (2017). Effects of an interdependent group contingency on the transition behavior of middle school students with emotional and behavioral disorders. School psychology quarterly, 32(2), 282.
Direct observation plays an important role in the assessment practices of school psychologists and in the development of evidence-based practices in general and special education. The defining psychometric features of direct observation are presented, the contributions to assessment practice reviewed, and a specific proposal is offered for evaluating the psychometric merit of direct observation in both practitioner developed and commercial/research specific applications.
Hintze, J. M. (2005). Psychometrics of direct observation. School Psychology Review, 34(4), 507-519.
In this book E.D. Hirsch, the author of Cultural Literacy, makes a case against much of what schools are now trying to do to improve education. Lifelong learning, multiple intelligences, learning style, metacognitive skills, cooperative learning, critical thinking, and learning to learn will do little for students, he says.
Hirsch, E. D. (1997). The schools we need: Why we don't have them?. National Association of Secondary School Principals. NASSP Bulletin, 81(589), 121.
Used a direct observation-based approach to identify behavioral conditions in sending (i.e., special education) and in receiving (i.e., regular education) classrooms and to identify targets for intervention that might facilitate mainstreaming of behavior-disordered (BD) children.
Hoier, T. S., McConnell, S., & Pallay, A. G. (1987). Observational assessment for planning and evaluating educational transitions: An initial analysis of template matching. Behavioral Assessment.
The School-Wide Evaluation Tool (SET; Sugai, Lewis-Palmer, Todd, & Horner, 2001) was created to provide a rigorous measure of primary prevention practices within school-wide behavior support. In this article, the authors describe the SET and document its psychometric characteristics.
Horner, R. H., Todd, A. W., Lewis-Palmer, T., Irvin, L. K., Sugai, G., & Boland, J. B. (2004). The school-wide evaluation tool (SET) a research instrument for assessing school-wide positive behavior support. Journal of Positive Behavior Interventions, 6(1), 3-12.
Focusing on elementary classrooms, chapters include: Students' Feelings about School; Involvement and Withdrawal in the Classroom; Teachers Views; The Need for New Perspectives.
Jackson, P. W. (1990). Life in classrooms. Teachers College Press.
Differential reinforcement of other behavior (DRO) has been applied to reduce problem behavior in various forms across different populations. We review DRO research from the last 5 years, with a focus on studies that enhance our understanding of the underlying mechanisms of DRO.
Jessel, J., & Ingvarsson, E. T. (2016). Recent advances in applied research on DRO procedures. Journal of Applied Behavior Analysis, 49(4), 991-995.
Some of the specific reasons for the success or failure of retention in the area of reading were examined via an in-depth study of a small number of both at-risk retained students and comparably low skilled promoted children
Juel, C., & Leavell, J. A. (1988). Retention and nonretention of at-risk readers in first grade and their subsequent reading achievement. Journal of Learning Disabilities, 21(9), 571-580.
PISA measures the performance of 15-year-old students in science, reading, and mathematics literacy every 3 years. PISA uses the term "literacy" in each subject area to indicate how well students are able to apply their knowledge and skills to problems in a real-life context.
Kastberg, D., Chan, J. Y., & Murray, G. (2016). Performance of US 15-Year-Old Students in Science, Reading, and Mathematics Literacy in an International Context: First Look at PISA 2015. NCES 2017-048. National Center for Education Statistics.
Responsiveness to intervention (RTI) is being proposed as an alternative model for making decisions about the presence or absence of specific learning disability. The author argue that there are many questions about RTI that remain unanswered, and radical changes in proposed regulations are not warranted at this time.
Kavale, K. A. (2005). Identifying specific learning disability: Is responsiveness to intervention the answer?. Journal of Learning Disabilities, 38(6), 553-562.
Considers design issues and strategies by comparative outcome studies, including the conceptualization, implementation, and evaluation of alternative treatments; assessment of treatment-specific processes and outcomes; and evaluation of the results. It is argued that addressing these and other issues may increase the yield from comparative outcome studies and may attenuate controversies regarding the adequacy of the demonstrations.
Kazdin, A. E. (1986). Comparative outcome studies of psychotherapy: methodological issues and strategies. Journal of consulting and clinical psychology, 54(1), 95.
Data Evaluation of 285 previous chapters have discussed fundamental issues about assessment and experimental design for single-case research. I mentioned that the third component of methodology after assessment and design is data evaluation. The three components work in concert to permit one to draw inferences about the intervention.
Kazdin, A. E., & Tuma, A. H. (1982). Single-case research designs.
The present study used cross-sectional data from 1,438 schools to examine relations between fidelity self-assessment and team-based fidelity measures in the first 4 years of implementation of School-Wide Positive Behavioral Interventions and Supports (SWPBIS). Results showed strong positive correlations between fidelity self-assessments and a team-based measure of fidelity at each year of implementation.
Khoury, C. R., McIntosh, K., & Hoselton, R. (2019). An Investigation of Concurrent Validity of Fidelity of Implementation Measures at Initial Years of Implementation. Remedial and Special Education, 40(1), 25-31.
This meta-analysis examines the impact of formative assessment.
Kingston, N., & Nash, B. (2011). Formative assessment: A meta?analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28-37.
More than 300 studies that appeared to address the efficacy of formative assessment in grades K-12 were reviewed. Many of the studies had severely flawed research designs yielding uninterpretable results.
Kingston, N., & Nash, B. (2011). Formative assessment: A meta‐analysis and a call for research. Educational measurement: Issues and practice, 30(4), 28-37.
The purpose of this study was to examine the special education referral and decision making process for English language learners, with a focus on Child Study Team meetings and placement conferences/multidisciplinary team meetings. We wished to learn how school personnel determined if ELLs who were struggling had disabilities, to what extent those involved in the process understood second language acquisition, and whether language issues were considered when determining special education eligibility.
Klingner, J. K. (2006). The special education referral and decision-making process for English Language Learners: Child study team meetings and staffing.
The authors proposed a preliminary FI theory (FIT) and tested it with moderator analyses. The central assumption of FIT is that FIs change the locus of attention among 3 general and hierarchically organized levels of control: task learning, task motivation, and meta-tasks (including self-related) processes.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological bulletin, 119(2), 254.
A comprehensive search of the research on formative assessment interventions was recently released. This study identified 23 studies that it determined were rigorous enough for inclusion to build a picture of the impact of formative assessment interventions on student outcomes. The study concluded that formative assessment had a positive effect on student academic achievement. On average across all the studies, students who participated in formative assessment performed better on measures of academic achievement than those who did not. Across all subject areas (math, reading, and writing), formative assessment had larger effects on student academic achievement when other agents, such as a teacher or a computer program, directed the formative assessment.
Klute, M., Apthorp, H., Harlacher, J., & Reale, M. (2017). Formative assessment and elementary school student academic achievement: A review of the evidence.
this study presents and apply a framework for measuring the cost of coaching programs to 3 schools. Then the study discusses strategies for reducing the average cost of instructional coaching.
Knight, D. S. (2012). Assessing the cost of instructional coaching. Journal of Education Finance, 52-80.
Like we consider our formative years when we draw conclusions about ourselves, a formative assessment is where we begin to draw conclusions about our students' learning. Formative assessment moves can take many forms and generally target skills or content knowledge that is relatively narrow in scope (as opposed to summative assessments, which seek to assess broader sets of knowledge or skills).
Knowles, J. (2020). Teachers’ Essential Guide to Formative Assessment.
For decades we’ve been studying, experimenting with, and wrangling over different approaches to improving public education, and there’s still little consensus on what works, and what to do. The one thing people seem to agree on, however, is that schools need to be held accountable—we need to know whether what they’re doing is actually working.
Koretz, D. (2017). The testing charade. University of Chicago Press.
The research reported here investigated the effects of Maryland School Performance Assessment Program (MSPAP) by surveying teachers and principals in two of the three grades in which MSPAP is administered.
Koretz, D., Mitchell, K., Barron, S., & Keith, S. (1996). The perceived effects of the Maryland school performance assessment program. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Assessment (University of California at Los Angeles).
In recent years, states have sought to increase accountability for public school teachers by implementing a package of reforms centered on high-stakes evaluation systems. We examine the effect of these reforms on the supply and quality of new teachers.
Kraft, M. A., Brunner, E. J., Dougherty, S. M., & Schwegman, D. J. (2020). Teacher accountability reforms and the supply and quality of new teachers. Journal of Public Economics, 188, 104212.
To measure retention of oral reading fluency, three students attending a learning support classroom used a repeating reading strategy with two passages. Each student read one passage to a high performance standard and the other passage to a lower performance standard. Results show it took the students more practice to reach the higher performance standard in regards to both calendar days and practice trials.
Kubina, R. M., Amato, J., Schwilk, C. L., & Therrien, W. J. (2008). Comparing performance standards on the retention of words read correctly per minute. Journal of Behavioral Education, 17(4), 328-338.
An up-to-date, practical, reader-friendly resource that will help readers navigate today's seemingly ever-changing and complex world of educational testing, assessment, and measurement. The 11th edition presents a balanced perspective of educational testing and assessment, informed by developments and the ever increasing research base.
Kubiszyn, T., & Borich, G. (1987). Educational testing and measurement. Glenview, IL: Scott, Foresman.
In undertaking this study, two goals were established: (1) to obtain a better understanding of how much time students spend taking tests; and (2) to identify the degree to which the tests are mandated by districts or states.
Lazarín, M. (2014). Testing Overload in America's Schools. Center for American Progress.
The report is an overview of the key components of inclusive assessment and accountability and highlights how they fit together to form a cohesive whole.
Lehr, C., & Thurlow, M. (2003). Putting it all together: Including students with disabilities in assessment and accountability systems. Policy Directions No, 16.
This study uses longitudinal administrative data to examine the relationship between third- grade reading level and four educational outcomes: eighth-grade reading performance, ninth-grade course performance, high school graduation, and college attendance.
Lesnick, J., Goerge, R., Smithgall, C., & Gwynne, J. (2010). Reading on grade level in third grade: How is it related to high school performance and college enrollment. Chicago: Chapin Hall at the University of Chicago, 1, 12.
During the past two decades, performance-based accountability systems (PBASs), which link financial or other incentives to measured performance as a means of improving services, have gained popularity among policymakers
Leuschner, K. J. (2010). Are Performance-Based Accountability Systems Effective? Evidence from Five Sectors. Research Brief. RAND Corporation.
This article provides a summary of measuring the fiscal impact of practices in education
Levin, H. M., & McEwan, P. J. (2002). Cost-effectiveness and educational policy. Larchmont, NY: Eye on Education.
Performance assessments such as the Teacher Performance Assessment (edTPA) are used by state departments of education as one measure of competency to grant teaching certification. Although the edTPA is used as a summative assessment, research studies in other forms of performance assessments, such as the Performance Assessment for California Teachers and the National Board Certification for Professional Teaching Standards have shown that they can be used as learning tools for both preservice and experienced teachers and as a form of feedback for teacher education programs.
Lin, S. (2015). Learning through action: Teacher candidates and performance assessments (Doctoral dissertation).
It is argued that there is a need to rethink the criteria by which the quality of educational assessments are judged and a set of criteria that are sensitive to some of the expectations for performancebased assessments are proposed
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational researcher, 20(8), 15-21.
About 11,500 Los Angeles Unified elementary school teachers and 470 elementary schools are included in The Times' updated database of "value-added" ratings.
Los Angeles Times. (2021). Los Angeles teacher ratings.
The long term trend test of the National Assessment of Educational Progress (LTT NAEP) is the longest running test of student achievement that provides a scientifically valid estimate of what American students have learned.
Loveless, T., (2016, October 17). The strange case of the disappearing NAEP. Brookings Institution
In a series of two studies, the relative sensitivity of traditional standardized achievement tests and alternative curriculum-based measures was assessed.
Marston, D., Fuchs, L. S., & Deno, S. L. (1986). Measuring pupil progress: A comparison of standardized achievement tests and curriculum-related measures. Diagnostique, 11(2), 77-90.
This research synthesis examines instructional research in a functional manner to provide guidance for classroom practitioners.
Marzano, R. J. (1998). A Theory-Based Meta-Analysis of Research on Instruction.
This is a study of classroom management on student engagement and achievement.
Marzano, R. J., Pickering, D., & Pollock, J. E. (2001). Classroom instruction that works: Research-based strategies for increasing student achievement. Ascd
School closings and the ever-increasing number of deaths provide the backdrop for a proposal by the Center for American Progress (CAP) to deny waivers of the federally mandated administration of standardized tests in spring 2021. Further, the federal government proposes to add to those assessments in ways that CAP argues would make the test results more useful.
Mathis, W. J., Berliner, D. C., & Glass, G. V. NEPC Review: Student Assessment During COVID-19 (Center for American Progress, September 2020).
The Common Core. Just last year, according to a Gallup poll, most Americans had never heard of the Common Core State Standards Initiative, or "Common Core," new guidelines for what kids in grades K–12 should be able to accomplish in reading, writing, and math. Designed to raise student proficiencies so the United States can better compete in a global market, the standards were drafted in 2009 by a group of academics and assessment specialists at the request of the National Governors Association and the Council of Chief State School Officers.
McArdle, E. (2014). What happened to the Common Core. Harvard Ed. Magazine, 14.
Interventions for challenging behavior are more likely to be effective when based on the results of a functional behavioral assessment. Research to date suggests that staff members in educational settings may not have the requisite levels of expertise or support to implement behavioral assessment procedures and design corresponding behavior support plans.
McCahill, J., Healy, O., Lydon, S., & Ramey, D. (2014). Training educational staff in functional behavioral assessment: A systematic review. Journal of Developmental and Physical Disabilities, 26(4), 479-505.
This report provides a detailed analysis of long-term dropout and completion trends and student characteristics of high school dropouts and completers. The first measure examined was the “event dropout rate” which is the percent of students who drop out in grades 10-12 without a high school diploma or alternative credential. The event dropout rate for SY 2015-16 was 4.8%, which translated into 532,000 students.
McFarland, J., Cui, J., Rathbun, A., and Holmes, J. (2018). Trends in High School Dropout and Completion Rates in the United States: 2018 (NCES 2019-117). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved December 14, 2018 from http://nces.ed.gov/pubsearch.
The study investigates the correlation and predictive value of curriculum-based measurement (CBM) against the Michigan Educational Assessment Program's (MEAP) fourth grade reading assessment.
McGlinchey, M. T., & Hixson, M. D. (2004). Using curriculum-based measurement to predict performance on state assessments in reading. School Psychology Review, 33, 193-203.
Full and durable implementation of school-based interventions is supported by regular evaluation of fidelity of implementation. Multiple assessments have been developed to evaluate the extent to which schools are applying the core features of school-wide positive behavioral interventions and supports (SWPBIS).
McIntosh, K., Massar, M. M., Algozzine, R. F., George, H. P., Horner, R. H., Lewis, T. J., & Swain-Bradway, J. (2017). Technical adequacy of the SWPBIS tiered fidelity inventory. Journal of Positive Behavior Interventions, 19(1), 3-13.
This report offers recommendations for the implementation of standards-based reform and outlines possible consequences for policy changes. It summarizes both the vision and intentions of standards-based reform and the arguments of its critics.
McLaughlin, M. W., & Shepard, L. A. (1995). Improving Education through Standards-Based Reform. A Report by the National Academy of Education Panel on Standards-Based Education Reform. National Academy of Education, Stanford University, CERAS Building, Room 108, Stanford, CA 94305-3084..
The effects of a personalized system of instruction (PSI) with and without a same-day retake contingency on the spelling performance of 10 behaviorally disordered students were evaluated. The results indicate more spelling lessons were passed with 100% accuracy when the PSI program was in effect.
McLaughlin, T. F. (1991). Use of a personalized system of instruction with and without a same-day retake contingency on spelling performance of behaviorally disordered children. Behavioral Disorders, 16(2), 127-132.
his pioneering text provides a comprehensive and highly accessible introduction to the principles, concepts, and methods currently used in educational research. This text also helps students master skills in reading, conducting, and understanding research.
McMillan, J. H., & Schumacher, S. (1997). Research in education: A conceptual approach (4th ed.). New York, NY: Longman.
This paper reports evidence-based research and offers suggestions based on studies that include theoretical work, qualitative analysis, statistical analysis, and randomized experience that could provide strong causal evidence of the effects of teacher preparation on student learning.
Meadows, L., Theodore, K. (2012). Teacher Preparation Programs: Research and Promising Practices. Retrieved from http://www.sedl.org/txcc/resources/briefs/number_11/
Several reliable and valid fidelity surveys are commonly used to assess Tier 1 implementation in School-Wide Positive Behavioral Interventions and Supports (SWPBIS); however, differences across surveys complicate consequential decisions regarding school implementation status when multiple measures are compared. Compared with other measures, the PBIS Self-Assessment Survey (SAS) was more sensitive to differences among schools at higher levels of implementation. Implications for SWPBIS research and fidelity assessment are discussed.
Mercer, S. H., McIntosh, K., & Hoselton, R. (2017). Comparability of fidelity measures for assessing tier 1 school-wide positive behavioral interventions and supports. Journal of Positive Behavior Interventions, 19(4), 195-204.
This study examined processes and techniques teachers used to ensure that their assessments were valid and reliable, noting the extent to which they engaged in these processes.
Mertler, C. A. (1999). Teachers'(Mis) Conceptions of Classroom Test Validity and Reliability.
The edTPA is a performance assessment of teaching used variously by some state education departments and institutions of higher education for licensure testing and evaluating teaching candidates. The edTPA artifacts include written commentaries, lesson plans, video segments, and samples of student work, based on a series of three to five consecutive lessons.
Meuwissen, K., Choppin, J., Shang-Butler, H., & Cloonan, K. (2015). Teaching candidates’ perceptions of and experiences with early implementation of the edTPA licensure examination in New York and Washington States. Warner School of Education, University of Rochester.
Assessment, or testing, fulfills a vital role in today’s educational environment. Assessment results often are a major force in shaping public perceptions about the capabilities of our students and the quality of our schools. As a primary tool for educators and policymakers, assessment is used for many important purposes.
Missouri Department of Elementary and Secondary Education. (2019). Missouri Assessment Program: Grade level assessments.
Assessments used in Missouri are designed to measure how well students acquire the skills and knowledge described in Missouri’s Learning Standards (MLS). The assessments yield information on academic achievement at the student, class, school, district and state levels.
Missouri Department of Elementary and Secondary Education. (2020). Missouri Assessment Program
The goal of this guide is to provide useful information about standardized testing, or assessment, for practitioners and non-practitioners who care about public schools. It includes the nature of assessment, types of assessments and tests, and definitions.
Mitchell, R. (2006). A guide to standardized testing: The nature of assessment. Center for Public Education.
The primary purpose of this chapter is to review the literature on teachers’ summative assessment practices to note their influence on teachers and teaching and on students and learning.
Moss, C. M. (2013). Research on classroom summative assessment. SAGE handbook of research on classroom assessment, 235-255.
How did U.S. students perform on the most recent assessments? Select a jurisdiction and a result to see how students performed on the latest NAEP assessments.
National Assessment of Education Progress (NAEP). (2020) Nation’s report card. Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education
Since 2003, the National Center for Education Statistics (NCES) has compared each state's standard for proficient performance in reading and mathematics at grades 4 and 8 by placing the state standards onto common scales from the National Assessment of Educational Progress (NAEP). This process of "state mapping" shows where each state's standards fall on the NAEP scales and in relation to the NAEP achievement levels— NAEP Basic, NAEP Proficient, and NAEP Advanced—providing important contributions to the discussion of state standards.
National Center for Education Statistics. (2020). Mapping state proficiency standards.
Current expenditures for education can be expressed in terms of the percentage of funds going toward salaries, employee benefits, purchased services, tuition, supplies, or other expenditures.
National Center for Education Statistics. (n.d.). Fast facts: Expenditures
The National Assessment of Educational Progress (NAEP) is a national assessment of what America's students know in mathematics, reading, science, writing, the arts, civics, economics, geography, and U.S. history.
National Center for Education Statistics
Over the past twenty years many reading interventions have been proposed. One of these, “Book Flooding”, proposes that providing an enriched environment in which books are present and readily available can improve reading. Much of the research on this topic has focused on exposing children in the early grades to storybooks. Given the greater importance on reading complex text in meeting new reading standards, this study examines the impact of book flooding of books that stress academic words and technical terms. This quasi-experimental study examines the influence of a book distribution program targeted at enhancing children’s exposure to information books. The research examined whether a flood of information books in early childhood settings could affect growth in language, content-related vocabulary, and concepts of comprehending information text. The study concludes there were no significant effects on student outcomes and that book distribution programs on their own need to be reevaluated if they are to improved student reading performance.
Neuman, S. B. (2017). The Information Book Flood: Is Additional Exposure Enough to Support Early Literacy Development?. The Elementary School Journal, 118(1), 1-27.
No Child Left Behind Act of 2001 ESEA Reauthorization
No child left behind act of 2001. Publ. L, 107-110. (2002)
School reformers and state and federal policymakers turned to standardized testing over the years to get a clearer sense of the return on a national investment in public education that reached $680 billion in 2018-19. They embraced testing to spur school improvement and to ensure the educational needs of traditionally underserved students were being met.
OLSON, L., & JERALD, C. (2020). THE BIG TEST.
Flipped classrooms are by design highly interactive. As a result, formative assessment is a necessary component of the flipped classroom. Professors need to be able to assess students' in the class, use this assessment information to inform classroom activities in real time and personalize learning for their students.
Onodipe, G., & Ayadi, M. F. (2020). Using smartphones for formative assessment in the flipped classroom. Journal of Instructional Pedagogies, 23.
This study examined the hypothesis that teachers’ and students’ assessment of preferred LS correspond. The study found no relationship between pupils’ self-assessment and teachers’ assessment. Teachers’ and students’ answers didn’t match up. The study suggests that teachers cannot assess the LS of their students accurately.
Papadatou-Pastou, M., Gritzal, M., & Barrable, A. (2018). The Learning Styles educational neuromyth: Lack of agreement between teachers’ judgments, self-assessment, and students’ intelligence. Frontiers in Education, 3, 1-5. . https://doi.org/10.3389/feduc.2018.00105
Student engagement at school and whether students feel hopeful about their future are far better factors to consider when evaluating schools than using standardized test scores, according to the results of the 47th annual PDK/Gallup Poll of the Public’s Attitudes Toward the Public Schools.
PDK/Gallup Poll (2015). Testing lacks public support. Phi Delta Kappan, 97(1), 8–10.
The education reform movement of the past two decades has focused on raising academic
standards. Some standards advocates attach a testing mechanism to gauge the extent to
which high standards are actually accomplished, whereas some critics accuse the push for
standards and testing of impeding reform and perpetuating inequality.
Phelps, R. (2005). Defending standardized testing. Psychology Press.
The Standardized Testing Primer provides non-specialists with a thorough overview of this
controversial and complicated topic. It eschews the statistical details of scaling, scoring, and
measurement that are widely available in textbooks and at testing organization Web sites,
and instead describes standardized testing's social and political roles and its practical uses-
who tests, when, where, and why.
Phelps, R. P. (2007). Standardized testing primer (Vol. 21). Peter Lang.
Positive teacher-student interactions are a primary ingredient of quality early educational experiences that launch future school success. With CLASS, educators finally have an observational tool to assess classroom quality in pre-kindergarten through grade 3 based on teacher-student interactions rather than the physical environment or a specific curriculum
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System™: Manual K-3. Baltimore, MD, US: Paul H Brookes Publishing.
This article shows some common type of misused or unhelpful NAEP analyses to look out for and avoid. This article also give some warning to avoid misuse of the NAEP data.
Polikoff, M.S. (2015). Friends don’t let friends misuse NAEP data. Retrieved from https://morganpolikoff.com/2015/10/6/friends-dont-let-friends-misuse-naep-data/
American teachers are feeling enormous pressure these days to raise their students' scores
on high-stakes tests. As a consequence, some teachers are providing classroom instruction
that incorporates, as practice activities, the actual items on the high-stakes tests.
Popham, W. J. (2001). Teaching to the Test?. Educational leadership, 58(6), 16-21.
This book contains necessary information to help teachers deal with the assessment concerns of classroom teachers.
Popham, W. J. (2014). Classroom assessment: What teachers need to know (7th ed.). Boston, MA: Pearson Education.
The Vanderbilt Assessment of Leadership in Education (VAL-ED) is a multirater assessment of principals' learning-centered leadership. The instrument was developed based on the Standards for Educational and Psychological Testing. In this article, we report on the validity and reliability evidence for the VAL-ED accumulated in a national field trial.
Porter, A. C., Polikoff, M. S., Goldring, E. B., Murphy, J., Elliott, S. N., & May, H. (2010). Investigating the validity and reliability of the Vanderbilt Assessment of Leadership in Education. The Elementary School Journal, 111(2), 282-313.
Donald Campbell was an American social psychologist and noted experimental social science researcher who did pioneering work on methodology and program evaluation. He has also become—posthumously—an unlikely hero of the anti-testing and accountability movement in the United States.
Porter-Magee, K. (2013, February 26). Trust but verify: The real lessons of Campbell’s Law. Thomas B. Fordham Institute.
This report, one of six state-of-the-field reports, explores the connection between learning-focused leadership and leadership assessment as it contributes to coherent leadership assessment systems. The report outlines the function and implication of leadership assessment in national, state and local contexts.
Portin, B. S., Feldman, S., & Knapp, M. S. (2006). Purposes, Uses, and Practices of Leadership Assessment in Education. Center for the Study of Teaching and Policy.
The effects of active participation on student learning of simple probability was investigated using 20 fifth-grade classes randomly assigned to level of treatment. t was concluded that active student participation exerts a positive influence on fifth-grade student achievement of relatively unique instructional material.
Pratton, J., & Hales, L. W. (1986). The effects of active participation on student learning. The Journal of Educational Research, 79(4), 210-215.
This report presents data on income and poverty in the United States based on information collected in the 2016 and earlier Current Population Survey Annual Social and Economic Supplements (CPS ASEC) conducted by the U.S. Census Bureau. This report contains two main sections, one focuses on income and the other on poverty.
Proctor, B. D., Semega, J. L., & Kollar, M. A. (2016). Income and poverty in the United States: 2015. US Census Bureau, Current Population Reports, P60-256.
Former Assistant Secretary of Education Diane Ravitch was once an early advocate of No Child Left Behind, school vouchers and charter schools. No Child Left Behind required schools to administer yearly state standardized tests. Student progress on those tests was measured to see if the schools met their Adequate Yearly Progress goals. or AYP. Schools missing those goals for several years in a row could be restructured, replaced or shut down.
Ravitch, D. (2011). Standardized testing undermines teaching. National Public Radio.
Instructional theory describes a variety of methods of instruction (different ways of facilitating human learning and development) and when to use--and not use--each of those methods. It is about how to help people learn better.
Reigeluth, C. M. (1999). The elaboration theory: Guidance for scope and sequence decisions. Instructional design theories and models: A new paradigm of instructional theory, 2, 425-453.
Four primary issues are covered: the role of different measurement models in disparities of LD children serviced across states, types of students served as LD but not actually LD, components of severe discrepancies between aptitude and achievement, and state of the art in evaluating LD students.
Reynolds, C. R., & Willson, V. L. (1984). Critical measurement issues in learning disabilities. The Journal of Special Education, 18(4), 451-476.
This text employs a pragmatic approach to the study of educational tests and measurement so that teachers will understand essential psychometric concepts and be able to apply them in the classroom.
Reynolds, C. R., Livingston, R. B., Willson, V., & Willson, V. (2010). Measurement and assessment in education. Upper Saddle River, NJ: Pearson Education.
This study examines the technical adequacy of curriculum-based measures for written language, one of the critical skills required for student success in school. The study concludes two scoring procedures, correct word sequences and correct minus incorrect sequences met criterion validity with commercially developed and state or locally developed criterion assessments.
Romig, J. E., Therrien, W. J., & Lloyd, J. W. (2017). Meta-analysis of criterion validity for curriculum-based measurement in written language. The Journal of Special Education, 51(2), 72-82.
Amrein and Berliner (2002b) compared National Assessment of Educational Progress (NAEP) results in high-stakes states against the national average for NAEP scores. In this analysis, a comparison group was formed from states that did not attach consequences to their state-wide tests.
Rosenshine, B. (2003). High-stakes testing: Another analysis. education policy analysis archives, 11, 24.
This report is about conceptual and methodological issues that arise when educational researchers use data from large-scale, survey research studies to investigate teacher effects on student achievement. We use data from Prospects to estimate the "overall" size of teacher effects on student achievement and to test some specific hypotheses about why such effects occur.
Rowan, B., Correnti, R., & Miller, R. J. (2002). What large-scale, survey research tells us about teacher effects on student achievement: Insights from the prospectus study of elementary schools.
The present paper discusses three design options potentially useful for the investigation of response maintenance. These include: (a) the sequential-withdrawal, (b) the partial-withdrawal, and (c) the partial-sequential withdrawal designs. Each design is illustrated and potential limitations are discussed.
Rusch, F. R., & Kazdin, A. E. (1981). Toward a methodology of withdrawal designs for the assessment of response maintenance. Journal of Applied Behavior Analysis, 14(2), 131-140.
The purpose of this study was to explore the purpose and practices of leadership evaluation as perceived by principals. The researcher wanted to identify the perceived purposes and practices of leadership evaluation as described by nine public school principals, and to respond to the apparent need expressed by administrators to receive substantive feedback.
Sanders, K. (2008). The purpose and practices of leadership assessment as perceived by select public middle and elementary school principals in the Midwest. Aurora University.
The purpose of this chapter is to explain the role of treatment integrity assessment within the “implementing solutions” stage of a problem-solving model.
Sanetti, L. H., & Kratochwill, T. R. (2005). Treatment integrity assessment within a problem-solving model. Assessment for intervention: A problem-solving approach, 314-325.
The National Assessment of Educational Progress is widely viewed as the most accurate and reliable yardstick of U.S. students’ academic knowledge. But when it comes to many of the ways the exam’s data are used, researchers have gotten used to gritting their teeth.
Sawchuk, S. (2013). When bad things happen to good NAEP data. Education Week, 32(37), 1-22.
Is it time to kill annual testing? An Education Week article.
Sawchuk, S. (2019). Is it time to kill annual testing. Education Week, 8.
This book looks at research and theoretical models used to define educational effectiveness with the intent on providing educators with evidence-based options for implementing school improvement initiatives that make a difference in student performance.
Scheerens, J. and Bosker, R. (1997). The Foundations of Educational Effectiveness. Oxford:Pergmon
Formative assessment has the potential to support teaching and learning in the classroom. This study reviewed the literature on formative assessment to identify prerequisites for effective use of formative assessment by teachers. The review sought to address the following research question: What teacher prerequisites need to be in place for using formative assessment in their classroom practice?
Schildkamp, K., van der Kleij, F. M., Heitink, M. C., Kippers, W. B., & Veldkamp, B. P. (2020). Formative assessment: A systematic review of critical teacher prerequisites for classroom practice. International Journal of Educational Research, 103, 101602.
In this paper, student-level indicators of opportunity to learn (OTL) included in the 2012 Programme for International Student Assessment are used to explore the joint relationship of OTL and socioeconomic status (SES) to student mathematics literacy. This paper suggest that in most countries, the organization and policies defining content exposure may exacerbate educational inequalities.
Schmidt, W. H., Burroughs, N. A., Zoido, P., & Houang, R. T. (2015). The role of schooling in perpetuating educational inequality: An international perspective. Educational Researcher, 44(7), 371-386.
Part of the president Bush strategy for the transformation of "American Schools" lies in an accountability system that would track progress toward the nation's education goals as well as provide the impetus for reform. Here we focus primarily on issues of accountability and student achievement.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Research news and comment: Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21(4), 22-27.
In today's political climate, standardized tests are inadequate and misleading as achievement measures. Educators should employ a variety of measures, improve standardized test content and format, and remove incentives for teaching to the test. Focusing on raising test scores distorts instruction and renders scores less credible. Includes 13 references.
Shepard, L. A. (1989). Why We Need Better Assessments. Educational leadership, 46(7), 4-9.
The goal of this paper is to provide a general understanding for teachers and administrators of the concepts of validity and reliability; thereby, giving them the confidence to develop their own assessments with clarity of these terms.
Shillingburg. W. (2016). Understanding validity and reliability in classroom, school-wide, or district-wide assessments to be used in teacher/principal evaluations. Retrieved from https://cms.azed.gov/home/GetDocumentFile?id=57f6d9b3aadebf0a04b2691a
Curriculum-Based Measurement and Special Services for Children is a concise and convenient guide to CBM that demonstrates why it is a valuable assessment procedure, and how it can be effectively utilized by school professionals.
Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. Guilford Press.
A recent large-scale evaluation of Reading Recovery, a supplemental reading program for young struggling readers, supports previous research that found it to be effective. In a 4 year, federally funded project, almost 3,500 students in 685 schools found that generally students benefitted from the intervention. Students receiving Reading Recovery receive supplemental services in a 1:1 instructional setting for 30 minutes 5 days a week from an instructor trained in Reading Recovery. In the study reported here, students who received Reading Recovery had effect sizes of .35-.37 relative to a control group across a number of measures of reading. These represent moderate effect sizes and account for about a 1.5 month increase in skill relative to the control group. Even though the research supports the efficacy of the intervention, it also raises questions about its efficiency. The schools that participated in the study served about 5 students and the estimated cost per student has ranged from $2,000-$5,000. These data raise questions about the wisdom of spending this much money per student for growth of about a month and a half.
Sirinides, P., Gray, A., & May, H. (2018). The Impacts of Reading Recovery at Scale: Results From the 4-Year i3 External Evaluation. Educational Evaluation and Policy Analysis, 0162373718764828.
Evidence-based practice is a decision-making framework. This paper describes the relationships among the three cornerstones of this framework.
Spencer, T. D., Detrich, R., & Slocum, T. A. (2012). Evidence-based Practice: A Framework for Making Effective Decisions. Education & Treatment of Children (West Virginia University Press), 35(2), 127-151.
The Stanford Education Data Archive (SEDA) is an initiative aimed at harnessing data to help scholars, policymakers, educators, and parents learn how to improve educational opportunity for all children. The data are publicly available here, so that anyone can obtain detailed information about American schools, communities, and student success.
Stanfrod Education Data Archive. Standford Center for Education Policy Analysis. Retrieved from https://cepa.stanford.edu/seda/overview
Introduces a new analytic strategy for comparing the cognitive profiles of children developing reading skills at different rates: a regression-based logic analogous to the reading-level match design, but without some of the methodological problems of that design.
Stanovich, K. E., & Siegel, L. S. (1994). Phenotypic performance profile of children with reading disabilities: A regression-based test of the phonological-core variable-difference model. Journal of educational psychology, 86(1), 24.
This report, completed by the Center on Education Policy, attempts to provide an initial snapshot of the number and percentages of schools each states has identified low performing. It provides an early look at a very diverse set of guidelines. The data show a wide range of results in terms of the percentage of schools identified as low performing. The overall range is 3% to 99%, with individual states spread out fairly evenly in between. Eight states identified over 40% of their public schools as low performing, eleven states 20%–40%, fifteen states 11%–19%, and thirteen states 3%–10%. Even with the limitations of the data listed above, this data suggests inconsistent standards across states.
Stark Renter, D., Tanner, K., Braun, M. (2019). The Number of Low-Performing Schools by State in Three Categories (CSI, TSI, and ATSI), School Year 2018-19. A Report of the Center on Education Policy
This analysis examined the cost effectiveness of research from Stuart Yeh on common sturctural interventions in education. Additionally, The Wing Institute analyzes class-size reduction using Yeh's methods.
States, J. (2009). How does class size reduction measure up to other common educational interventions in a cost-benefit analysis? Retrieved from how-does-class-size.
Effective ongoing assessment, referred to in the education literature as formative assessment or progress monitoring, is indispensable in promoting teacher and student success. Feedback through formative assessment is ranked at or near the top of practices known to significantly raise student achievement. For decades, formative assessment has been found to be effective in clinical settings and, more important, in typical classroom settings. Formative assessment produces substantial results at a cost significantly below that of other popular school reform initiatives such as smaller class size, charter schools, accountability, and school vouchers. It also serves as a practical diagnostic tool available to all teachers. A core component of formal and informal assessment procedures, formative assessment allows teachers to quickly determine if individual students are progressing at acceptable rates and provides insight into where and how to modify and adapt lessons, with the goal of making sure that students do not fall behind.
States, J., Detrich, R. & Keyworth, R. (2017). Overview of Formative Assessment. Oakland, CA: The Wing Institute. http://www.winginstitute.org/student-formative-assessment.
Summative assessment is an appraisal of learning at the end of an instructional unit or at a specific point in time. It compares student knowledge or skills against standards or benchmarks. Summative assessment includes midterm exams, final project, papers, teacher-designed tests, standardized tests, and high-stakes tests.
States, J., Detrich, R. & Keyworth, R. (2018). Overview of Summative Assessment. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative
This investigation contributed to previous research by separating the effects of simply making instructional changes, not based on student performance data, from the effects of making instructional changes in accordance with CBM data.
Stecker, P. M. (1995). Effects of instructional modifications with and without curriculum-based measurement on the mathematics achievement of students with mild disabilities.
This book is the core of a larger, comprehensive professional development program in
student involved classroom assessment that teaches standards of assessment quality, and
how to match achievement targets to assessment methods.
Stiggins, R. J., Arter, J. A., Chappuis, J., & Chappuis, S. (2004). Classroom assessment for student learning: Doing it right, using it well. Assessment Training Institute.
The classroom assessment procedures o f 36 teachers in grades 2 to 12 were studied in depth to determine the extent to which they measure students” higher order thinking skills in mathematics, science, social studies, and language arts.
Stiggins, RJ., Griswald, M., & Green, K. R. (1988). Measuring Thinking Skills Through Classroom Assessment. Paper presented at the 1988 annual meeting of the National Council on Measurement in Education, New Orleans, April.
America has been obsessed with student standardized tests for nearly 20 years. Now it looks like the country is at the beginning of the end of our high-stakes testing mania — both for K-12 “accountability” purposes and in college admissions.
Strauss, V. (2020). It looks like the beginning of the end of America’s obsession with student standardized tests. The Washington Post.
There are growing calls from across the political spectrum for the federal government to allow states to skip giving students federally mandated standardized tests in spring 2021 — but the man that President-elect Joe Biden tapped to be education secretary has indicated support for giving them.
Strauss, V. (2020b, December 30). Calls are growing for Biden to do what DeVos did: Let states skip annual standardized tests this spring. The Washington Post.
What you need to know about standardized testing.
Strauss, V. (2021, February 1). What you need to know about standardized testing. The Washington Post.
A meta-analysis involving 46 studies addressing the validity of this classification of poor readers revealed substantial overlap between the IQ-discrepant and IQ-consistent poor readers
Stuebing, K. K., Fletcher, J. M., LeDoux, J. M., Lyon, G. R., Shaywitz, S. E., & Shaywitz, B. A. (2002). Validity of IQ-discrepancy classifications of reading disabilities: A meta-analysis. American Educational Research Journal, 39(2), 469-518.
This checklist is designed to be completed by the PBIS Team once a quarter to monitor activities for implementation of PBIS in a school. The team should complete the Action Plan at the same time to track items that are In Progress or Not Yet Started items.
Sugai, G., Horner, R. H., & Lewis-Palmer, T. L. (2001). Team implementation checklist (TIC). Eugene, OR: Educational and Community Supports.
The EBS Survey is used by school staff for initial and annual assessment of effective behavior support systems in their school. The survey summary is used to develop an action plan for implementing and sustaining effective behavioral support systems throughout the school.
Sugai, G., Horner, R. H., & Todd, A. W. (2000). Effective behavior support: Self-assessment survey. Eugene: University of Oregon.
Test-based accountability systems — the use of tests to hold individuals or institutions responsible for performance and to reward achievement — have become the cornerstone of U.S. federal education policy, and the past decade has witnessed a widespread adoption of test-based accountability systems in the U.S. Consider just one material manifestation of this burgeoning trend: test sales have grown from from approximately $260 million annually in 1997 to approximately $700 million today — nearly a threefold increase.
Supowitz, J. (2021) Is high-stakes testing working? University of Pennsylvania, Graduate School of Education.
The federal role in developing the teacher workforce has increased markedly in the last
decade, but the history of such involvement dates back fifty years. Relying initially on
policies to recruit and train teachers, the federal role has expanded in recent years to
include new policy initiatives and instruments around the themes of accountability,
incentives, and qualifications, while also continuing the historic emphasis on teacher
recruitment, preparation, and development.
Sykes, G., & Dibner, K. (2009). Fifty Years of Federal Teacher Policy: An Appraisal. Center on Education Policy.
The second edition of this exceptionally lucid and practical assessment text provides a wealth of powerful concrete examples that help students to understand assessment concepts and to effectively use assessment to support learning.
Taylor, C. S., & Nolen, S. B. (2005). Classroom assessment: Supporting teaching and learning in real classrooms (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
The T-PESS process incorporates a series of actions and activities that should be applied on an ongoing basis. While the T-PESS process results in your annual summary assessment, it's better to think of it as an annual process of activities that help you self-assess, establish performance goals, collect and analyze information, and provide constructive feedback, improving your quality and effectiveness as the school leader.
Texas Education Agency. (2019). Charting a course for the professional growth and development of principals: Evaluation process.
This paper argues that ineffective practices in schools carry a high price for consumers and suggests that school systems consider the measurable yield in terms of gains in student achievement for their schooling effort.
VanDerHeyden, A. (2013). Are we making the differences that matter in education. In R. Detrich, R. Keyworth, & J. States (Eds.),Advances in evidence-‐based education: Vol 3(pp. 119–138). Oakland, CA: The Wing Institute. Retrieved from http://www.winginstitute.org/uploads/docs/Vol3Ch4.pdf
This article examines the impact of poor decision making in school psychology, with a focus on determining eligibility for special education. Effective decision making depends upon the selection and correct use of measures that yield reliable scores and valid conclusions, but traditional psychometric adequacy often comes up short. The author suggests specific ways in which school psychologists might overcome barriers to using effective assessment and intervention practices in schools in order to produce better results.
VanDerHeyden, A. M. (2018, March). Why Do School Psychologists Cling to Ineffective Practices? Let’s Do What Works. In School Psychology Forum, Research in Practice(Vol. 12, No. 1, pp. 44-52). National Association of School Psychologists.
Keeping RTI on Track is a resource to assist educators overcome the biggest problems associated with false starts or implementation failure. Each chapter in this book calls attention to a common error, describing how to avoid the pitfalls that lead to false starts, how to determine when you're in one, and how to get back on the right track.
Vanderheyden, A. M., & Tilly, W. D. (2010). Keeping RTI on track: How to identify, repair and prevent mistakes that derail implementation. LRP Publications.
Keeping RTI on Track is a resource to assist educators overcome the biggest problems associated with false starts or implementation failure. Each chapter in this book calls attention to a common error, describing how to avoid the pitfalls that lead to false starts, how to determine when you're in one, and how to get back on the right track.
Vanderheyden, A. M., & Tilly, W. D. (2010). Keeping RTI on track: How to identify, repair and prevent mistakes that derail implementation. LRP Publications.
To examine a response to treatment model as a means for identifying students with reading/learning disabilities, 45 second-grade students at risk for reading problems were provided daily supplemental reading instruction and assessed after 10 weeks to determine if they met a prior criteria for exit.
Vaughn, S., Linan-Thompson, S., & Hickman, P. (2003). Response to instruction as a means of identifying students with reading/learning disabilities. Exceptional children, 69(4), 391-409.
Data from the 1992 National Assessment of Educational Progress are used to compare the performance of New Jersey public school children with those from other participating states. The comparisons are made with the raw means scores and after standardizing all state scores to a common (National U.S.) demographic mixture. It is argued that for most plausible questions about the performance of public schools the standardized scores are more useful.
Wainer, H. (1994). Academic Performance of New Jersey's Public Schools. education policy analysis archives, 2, 10.
This literature review examines the impact of various instructional methods
Walberg H. J. (1999). Productive teaching. In H. C. Waxman & H. J. Walberg (Eds.) New directions for teaching, practice, and research (pp. 75-104). Berkeley, CA: McCutchen Publishing.
This is a meta-review and synthesis of the research on the variables related learning.
Wang, M. C., Haertel, G. D., & Walberg, H. J. (1990). What influences learning? A content analysis of review literature. The Journal of Educational Research, 30-43.
This report summarizes performance on PIRLS and ePIRLS 2016 from a U.S. perspective. PIRLS results are based on nationally representative samples of fourth-graders. The international data reported for PIRLS 2016 in this report cover 58 countries or other education systems, including the United States.
Warner-Griffin, C., Liu, H., Tadler, C., Herget, D., & Dalton, B. (2017). Reading Achievement of US Fourth-Grade Students in an International Context: First Look at the Progress in International Reading Literacy Study (PIRLS) 2016 and ePIRLS 2016. NCES 2018-017. National Center for Education Statistics.
This study examined the social attitudes related to race, gender, age, and ability among senior level health education students at a mid-sized university in the southeast by means of a personally experienced critical incident involving a cross-cultural incident.
Wasson, D. H., & Jackson, M. H. (2002). Assessing cross-cultural sensitivity awareness: A basis for curriculum change. Journal of Instructional Psychology, 29(4), 265-277.
This article overview the conceptual foundations and underlying principles of FBA and the methods and procedures associated with conducting FBAs in school settings.
Watson, T. S., & Skinner, C. H. (2001). Functional behavioral assessment: Principles, procedures, and future directions. School Psychology Review, 30(2), 156-172.
The authors describe and compare the three major guidelines on schizophrenia that have been published in the United States: The American Psychiatric Association's Practice Guideline for the Treatment of Patients with Schizophrenia, the Expert Consensus Guidelines Series: Treatment of Schizophrenia, and the SchizophreniaPatient Outcome Research Team (PORT) Treatment Recommendations.
Weiden, P. J., & Dixon, L. (1999). Guidelines for schizophrenia: consensus or confusion?. Journal of Psychiatric Practice®, 5(1), 26-31.
The purpose of this study was to ascertain the effects of manipulating the data base used for instructional decision making on student achievement.
Wesson, C., Skiba, R., Sevcik, B., King, R. P., & Deno, S. (1984). The effects of technically adequate instructional data on achievement. Remedial and Special Education, 5(5), 17-22.
The purpose of this study was to assess the degree to which behavioral intervention studies conducted with persons with mental retardation operationally defined the independent variables and evaluated and reported measures of treatment integrity. The study expands the previous work in this area reported by Gresham, Gansle, and Noell (1993) and Wheeler, Baggett, Fox, and Blevins (2006) by providing an evaluation of empirical investigations published in multiple journals in the fields of applied behavior analysis and mental retardation from 1996–2006. Results of the review indicated that relatively few of the studies fully reported data on treatment integrity.
Wheeler, J. J., Mayton, M. R., Carter, S. L., Chitiyo, M., Menendez, A. L., & Huang, A. (2009). An assessment of treatment integrity in behavioral intervention studies conducted with persons with mental retardation. Education and Training in Developmental Disabilities, 187-195.
In this book, Grant P. Wiggins clarifies the limits of testing in an assessment system. Beginning with the premise that student assessment should improve performance, not just audit it, Wiggins analyzes some time-honored but morally and intellectually problematic practices in test design, such as the use of secrecy, distracters, scoring on a curve, and formats that allow for no explanation by students of their answers.
Wiggins, G. P. (1993). Assessing student performance: Exploring the purpose and limits of testing. Jossey-Bass.
In this video from Cool Reading Facts, Daniel Willingham, professor of psychology at the University of Virginia, discusses significant factors key to success in reading comprehension. His analysis suggests that educators frequently miss the critical role that basic knowledge plays in successfully interpreting and understanding passages in reading texts and that reading comprehension tests are actually knowledge tests in disguise. He makes three important points: (1) Students must have the basic decoding skills to translate print into meaningful information, (2) having a basic familiarity with the subject matter is of prime importance in comprehending what the writer is trying to communicate, and (3) providing students with an enriched knowledge base through the school’s curriculum is especially important for students from disadvantaged circumstances, whose only source of essential background information often is school. In contrast, children from privileged circumstances may be introduced to essential background information away from school.
Willingham, D. (2017). Cool Reading Facts 5: Reading comprehension tests don’t test reading [Video file]. National Public Radio, Science Friday Educator Collaborative.
This study evaluated the effects of performance feedback on increasing the quality of implementation of interventions by teachers in a public school setting.
Witt, J. C., Noell, G. H., LaFleur, L. H., & Mortenson, B. P. (1997). Teacher use of interventions in general education settings: Measurement and analysis of ?the independent variable. Journal of Applied Behavior Analysis, 30(4), 693.
This study compares the effect size and return on investment for rapid assessment, between, increased spending, voucher programs, charter schools, and increased accountability.
Yeh, S. S. (2007). The cost-effectiveness of five policies for improving student achievement. American Journal of Evaluation, 28(4), 416-436.
This paper conducts an analytic literature review to examine the use and operationalization of the term in multiple academic fields
York, T. T., Gibson, C., & Rankin, S. (2015). Defining and measuring academic success. Practical Assessment, Research & Evaluation, 20(5), 1–20. Retrieved from https://scholarworks.umass.edu/pare/vol20/iss1/5/
This report provides information on states that require students enrolled in courses that have an end-of-course (EOC) exam to take the EOC
Zinth, J. D. (2012). End-of-Course Exams. Education Commission of the States (NJ3).