What do the data tell us?
Too often, important decisions in education are made based on beliefs, ideology, politics, and, at best, incomplete information. When there is evidence, it is usually scarce, disorganized, and inconclusive. As a result, educators don’t ask the critical question: “What does the data tell us?”
As part of its “catalyst” role, the Wing Institute is partnering with Supporting Evidence (www.supportingevidence.com ) to launch a “data mining” initiative on its web site to help answer this question. The term “data mining” is frequently used in information technology. It generally refers to a process for collecting and analyzing data in order to identify trends, patterns, and relationships. The goals of the Wing Institute’s data mining are:
- Conduct an ongoing search for relevant research, data, and policy analyses.
- Conduct this research in the context of a “gap analysis”, by which fundamental questions are framed and knowledge gaps are identified.
- Display this information in accessible formats.
- Discuss implications of research and make suggestions for additional study.
- Provide an electronic community for the exchange of ideas, feedback and additional information.
Our data mining initiative is not designed to offer reviews of the research in terms of design, statistics, or conclusions.
Gap Analysis
The purpose of the “gap analysis” is to identify:
- critical questions that need to be answered to provide effective, evidence-based education (what we need to know)
- existing information that answers these questions (what we do know)
- “gaps” in the knowledge base (what we don’t know)
The initial questions are identified in the following graphic:

The gap analysis is organized into four areas of evidence: students, teachers, systems, and home. Each area has its own landing page and “gap analysis” heuristic.
SupportingEvidence.com
The Wing Institute would like to acknowledge the efforts of Scott Gibson of Supporting Evidence for his support in developing the “Data Mining” area of our web site. Supporting Evidence provided inspiration in conceptualizing and constructing “Data Mining”. We are please to announce that Scott will continue to be an active participant in research and display of data on the Wing Institute site. We also recommend you visit www.SupportingEvidence.com to see graphical displays of a variety of data on education, government, and health topics. The goal of Supporting Evidence is to collect data from reliable sources, compare and combine it in interesting ways, and publish understandable charts that inform, entertain, and support better decision-making. You will find it fascinating and we believe it is well worth a visit to explore the site.
Data Mining Tools
Guide to Evaluating Research and Definitions
The Wing Institute recommends the Institute of Education Science’s guide as a valuable tool in evaluating the validity and reliability of educational research. The guide has been designed to be practical and user-friendly for anyone interested in identifying those practices that meet rigorous standards from those that do not.
Identifying and Implementing Educational Practices Supported By Rigorous Evidence: A User-Friendly Guide, 2003
The field of K-12 education contains a vast array of educational interventions—such as reading and math curricula, school-wide reform programs, after-school programs, and new educational technologies—that claim to be able to improve educational outcomes and, in many cases, to be supported by evidence. This evidence often consists of poorly designed and/or advocacy-driven studies. State and local education officials and educators must sort through a myriad of such claims to decide which interventions merit consideration for their schools and classrooms. Many of these practitioners have seen interventions, introduced with great fanfare as being able to produce dramatic gains, come and go over the years, yielding little in the way of positive and lasting change—a perception confirmed by the flat achievement results over the past 30 years in the National Assessment of Educational Progress long-term trend.
The federal No Child Left Behind Act of 2001, and many federal K-12 grant programs, call on educational practitioners to use "scientifically-based research" to guide their decisions about which interventions to implement. Yet many practitioners have not been given the tools to distinguish interventions supported by scientifically rigorous evidence from those, which are not. This Guide is intended to serve as a user-friendly resource that the education practitioner can use to identify and implement evidence-based interventions, so as to improve educational and life outcomes for the children they serve.
Adapted from U.S. Department of Education, Institute of Education Science (ies.ed.gov/ncee/wwc/references/iDocViewer/Doc.aspx?docId=14&tocId=1 - 18k)
Definitions
Correlation
Correlation is a common and useful statistic frequently referred to in educational research. A correlation is a single number describing the strength and degree of relationship between two variables. Correlation cannot be used to infer a causal relationship between the variables. Correlation may be an indicator of a causal relationship or it may only reflect a relationship in which no direct causal process exists.
| Correlation |
Negative |
Positive |
| Small |
−0.3 to −0.1 |
0.1 to 0.3 |
| Medium |
−0.5 to −0.3 |
0.3 to 0.5 |
| Large |
−1.0 to −0.5 |
0.5 to 1.0 |
To interpret the significance of a correlation when reading a study it is essential to realize the meaning will depend on the context and purpose for which study was designed. For example a correlation of 0.9 would be low if used in a physics experiment, but conversely regarded as high in an educational study.
Effect Size
The effect size is a standardized measure of the effect of an intervention (treatment) on an outcome. The effect size represents the change (measured in standard deviations) in an average outcome that can be expected if that person is given the treatment. Because effect sizes are standardized, they can be compared across studies. Adapted from U.S. Department of Education, Institute of Education Science http://ies.ed.gov/ncee/wwc/help/glossary/
Effect size measures play an important role in meta-analysis studies that summarize findings from a specific area of research. In practical situations, effect sizes are helpful for making decisions, since a highly significant relationship may be uninteresting if its effect size is small.
The generally accepted benchmark for effect size comes from Jacob Cohen, a US statistician and psychologist.
Cohen (1988) hesitantly defined effect sizes as "small, d = .2," "medium, d = .5," and "large, d = .8", stating that "there is a certain risk in inherent in offering conventional operational definitions for those terms for use in power analysis in as diverse a field of inquiry as behavioral science." "The terms 'small,' 'medium,' and 'large' are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation. This risk is nevertheless accepted in the belief that more is to be gained than lost by supplying a common conventional frame of reference which is recommended for use only when no better basis for estimating the Effect Size index is available."
| Cohen’s d |
Effect Size |
| Small |
d = .2 |
| Medium |
d = .5 |
| Large |
d = .8 |
Adapted from Lee A. Becker 1998, 1999 - http://www.uccs.edu/~faculty/lbecker/es.htm
Sample Size: The importance of sample size when calculating effect size has recently become an issue of controversy with the publication of the What Works Clearinghouse (WWC) procedures for reviewing the quality of research studies. These guidelines require that only the highest rating be given to randomized controlled studies in which there are statistically significant effect sizes. Unfortunately, these guidelines do not take into account the impact small sample sizes can have on calculating effect sizes. Studies with small sample sizes have the potential to play a disproportional effect and thus produce inaccurate effect sizes when combined with other studies in a meta-analysis. This fact would indicate the need for developing methods for computing accurate effect sizes when employing studies of varying sizes.
Additional Information:
Stating the Meaning of Effect Size Measures in Plain English http://www.mclibrary.duke.edu/subject/ebm/stating_effect.pdf
Hierarchical Linear Modeling (HLM)
Hierarchical linear modeling provides a conceptual and statistical process based upon simple linear regression and multiple linear regression designed for investigating and drawing conclusions regarding the influence of phenomena at different levels of analysis. HLM is a type of regression model that is well suited for use with nested data enabling an analysis across hierarchical levels, whereas in simple linear and multiple linear regression analysis all effects are modeled at a single level. For example, in educational research, data is often considered as individual students nested within classrooms nested within schools (student > classroom > school > district). Education data sets typically select students from a group of schools and thus information about students are correlated (such that students from the same schools are similar in their traits). Adapted from Practical Assessment, Research, and Evaluation http://pareonline.net/getvn.asp?v=7&n=1
Randomized Control Trial (RCT)
An experiment in which investigators randomly assign eligible subjects into groups to receive or not receive one or more interventions that are being compared. Strong RCTs (that is, randomization is not compromised and attrition is not substantial) are classified as meets What Works Clearinghouse “Evidence Standards.” Adapted from
U.S. Department of Education, Institute of Education Scien ce http://ies.ed.gov/ncee/wwc/help/glossary/
Standard Deviation
Standard deviation is a mathematical formula for the average distance from the average. This measure informs on how spread out your data are.
Statistical Significance
Statistical significance refers to the probability that a result occurred by chance alone. A result is considered “statistically significant” if the probability it occurred by chance alone is 5 percent or less.
When researchers refer to statistical significance, they compare different sets of values - such as student achievement before and after the introduction of an intervention. The study must account for the number of participants, the impact of the findings, and possible variables of the people participating in the study. Statistics are employed to calculate probability values associated with the experiment, For example how effective was this intervention in the study on improving student scores on an achievement test in comparison to the performance of students who were in a control group. If the researcher finds the impact was less than .05 it may be considered statistically significant, although it still has a 5% probability of error. These probability values - called p values represent percentages, but are typically written as p < .05, p < .01, or p < .001. For example, p < .01 means that there is a less than 1% chance that our intervention worked based on chance. If probabilities are low, the results of the intervention can be considered as statistically significant. Tests of statistical significance do not directly measure interventions, but rather the probability the difference was a function of chance.
Adapted from U.S. Department of Education, Institute of Education Science http://ies.ed.gov/ncee/wwc/help/glossary/
The term "significance" when used in general speech suggests that something is important or meaningful. On the other hand, “statistical significance” signifies that the results of a study are probably true, did not likely occur by chance, and the statistics are reliable. It does not imply the result is important or has any practical value in making decisions. Researchers frequently will urge that effect size accompany tests of significance to clarify and provide perspective regarding the practical importance results.
Example:
10,000 students are tested on an IQ test. The question is subsequently asked if there is a significant difference between the scores of students living in warm climates as compared to those residing in cooler climates. The mean score for those residing in cooler regions is 97 and the mean score for warmer regions is 98. When analyzed and compared to an independent groups t-test it is determined that there is statistical significance at the .001 level. But is there any practical significance to this information? In this case there would appear to be little or no significance as defined in common parlance. On an IQ test this difference is so small as to have no meaning for a decision-maker.
Additional Information:
Elementary Concepts in Statistics http://www.statsoft.com/textbook/esc.html
Value-Added Modeling
Value-added models are statistical analyses that provide quantitative performance measures that can be used to develop, monitor, and evaluate schools and other aspects of the education system. When used in evaluating education systems, value-added models are comprised of a collection of complex statistical techniques that use multiple years of test score data to estimate the effects of individual schools or teachers on student performance.
Additional Information:
Measuring Improvements in Learning Outcomes: Best Practices to Assess the Value-Added of Schools, 2008
The Promise and Peril of Using Value-Added Modeling to Measure Teacher Effectiveness, 2004 http://www.rand.org/pubs/research_briefs/RB9050/RAND_RB9050.pdf
Please Login to Submit Comments
|