How The 2000 National Reading Panel Report Has Been Used To Deceive The Nation And “Reshape” Reading Instruction In Public Schools: A Call for NICHD to Withdraw Official Support and Issue a Retraction
The forensic analysis of the National Reading Panel has taken five years. It is part of a more expansive study of the etiology of the “science of reading” and the Right’s stated intention to “reshape” reading instruction in U.S. public schools, to bring it into line with the authoritarian shift that is taking place in every sector of American society.
In this post I am focused on providing readers with an overview and access to the quantitative analysis of the Phonemic Awareness (P/A) and Phonics meta-analyses conducted by the NRP that provides the “scientific evidence” on which President George W. Bush convinced Congress to pass the No Child Left Behind Act (NCLB), that included the Reading First initiative and was signed into law in January 2002.
Fast forward to 2025 and all subsequent “evidence based” Federal and State reading laws can be traced back to NCLB and to the findings of the NRP report, especially the P/A and Phonics meta-analyses.
Multiple sources have reported that the National Reading Panel report was cited over 16,000 times between 2000 and 2017, and that it was cited 699 times in 2017 alone. The citations appear in many scholarly journals, and the NRP report still impacts the research and pedagogies of renowned scholars and educators. In 2025 the report continues to be referenced in peer-reviewed research journals and in international reports.
There is evidence that most of the Panel have actively engaged in legitimizing and protecting the findings of the NRP report. In his April 2003 article, “Research-Based Reading Instruction: Myths about the National Reading Panel report,” published in The Reading Teacher, Timothy Shanahan writes, “Our report (National Reading Panel, 2000a) put forth what has become an influential interpretation of ‘research-based instruction.’ Our findings are now a cornerstone of U.S. federal reading education policy (the Reading First provisions of the Elementary and Secondary Education Act) that support extensive professional development for teachers with regard to NRP” (p. 646).
Shanahan is right that the NRP report has become the cornerstone of federal and state laws that mandate “scientifically based reading research,” and that the report is now embedded in more than 40 state laws that are the foundation of reading instruction in U.S. public schools. But Shanahan is wrong when he claims that the findings of the NRP have scientific legitimacy as “research-based instruction.”
The forensic analysis of the P/A and phonics NRP meta-analyses exposes narrow, outdated, linear conceptualization of “science” long rejected in many fields of study in which contemporary understandings of science that incorporate diverse methodologies, recognize the complexity of scientific progress, and acknowledges the limitations of both strictly linear and purely reductionist frameworks.
The obsolete views of science that are the basis of reductive reading research studies selected by the NRP have become ideologies that are embedded politically and financially in the very fabric American society. Many reading researchers have held on to outdated views of science collaborating with policy makers and contracting with texts book publishers who have found the reduction of phenomena to a single basic level of explanation politically and financially advantageous.
The report is talked about as if it is “settled science” and many researchers have been silenced for stating that it is not.
“Whoever is telling you to ignore the NRP Report knows very little about reading or reading research and is really doing kids a disservice by pretending to know something about those things,” Shanahan wrote in a blog post on August 13, 2017. Shanahan has used similar rhetoric in multiple publications, some of them much more vociferous.
In his 2003 Myths paper in The Reading Teacher Shanahan wrote that the 1998 book, Beginning to Read and the Spin Doctors of Science (Spin Doctors for short), that I wrote about the scientifically flawed Houston Reading Study was “a kind of nationwide barroom quarrel with many claims and little evidence” (p.646).
The evidence in Spin Doctors was checked by statisticians and holds up twenty-five years later. Spin Doctors is as relevant today as when the research was published, more so because the Houston Reading Study was included in the Phonics category by the National Reading Panel. A digital copy of Spin Doctors is open access on my website.
The postscript is that the Houston Reading Study that was selected by the NRP was later criticized and rejected by the IES-WWC for the non-equivalence of the treatment and control groups of children. This means that the treatment and control groups were not similar at the start of the experiment. They were not randomly assigned. The lack of equivalence is a problem because it can bias the results. Any differences observed at the end of the study could be due to pre-existing disparities rather than the intervention itself. Deep knowledge of the Houston Reading Study suggests that this is the case.
In future Substack posts we will focus on the renowned scholars who have been denigrated and silenced for not supporting the findings of the NRP. We will focus on how their research and pedagogical practices have been extinguished that support children who have had adverse childhood experiences, some of whom are struggling with life-long traumas. But in this Substack post, we will focus on the fundamental errors the National Reading Panel made in their P/A and Phonics meta-analyses that have national significance for how children are taught to read in U.S. public schools.
With regard to Shanahan’s contention that those who disagree with him know very little about reading research and that we are “doing kids a disservice,” once again my response is to encourage readers to read my last post on the negative impact of “science of reading” state laws have had and are having on the health and well-being, as well as the reading development, especially on the more than 60% of children in the U.S. who have had adverse childhood experiences.
Basically, the entire upcoming series of posts on Substack will focus on the “how” and “why” of the massive deception that has been perpetrated on the American people about how children should be taught to read in U.S. public schools. The so-called “evidence-based reading instruction” that Shanahan confirms originates in the NRP Report is now the cornerstone of federal and state laws disenfranchising children, and through rigorous testing regimes holds them and their teachers accountable for “fidelity” to the “science of reading.”
“Fidelity” is not even reported in the Phonics section of the NRP report. Nevertheless, “fidelity” occurs 46 times in the NRP report. The first occurrence On page 1-6 the Panel states, “Study methods must allow judgments about how instruction fidelity was ensured.” In the P/A section of the report of the 96 comparisons 31 have fidelity reports as “yes,” and 65 reported as “no.” Focusing on the P/A studies that were selected 19 ½ have fidelity reported as “yes,” and 32½ as “no.” There has to be a little humor here, one study, #38 has 2 comparisons – one reports “yes” for fidelity, and one reports “no.”
State laws and regulations explicitly use the word “fidelity” to emphasize that evidence-based reading instruction and interventions must be implemented to meet legal requirements. States including Colorado, Connecticut, Florida, North Carolina, New Mexico, Tennessee, and Wisconsin have enshrined fidelity to “evidence-based reading instruction” in state laws and policy documents and the term has entered the frequently used lexicon of public-school teachers
For this reason, it is important that the public is aware that fidelity to the “evidence-based reading instruction” that has been enshrined in federal and state laws is irreparably flawed and is being used to lead children in America in the wrong direction.
I have included the first set of tables from the forensic analysis of the P/A and Phonics meta-analyses in this post to provide a framework for a series of tables from the forensic analysis to support the conclusion that the NRP report has no scientific validity. In the concluding recommendations to this paper, I have called for NICHD to withdraw the NRP Report and for NICHD to recommend that retractions be added to published accounts that have used the report.
But before the Substack posts of the tables are uploaded I will address the myth that the NRP analyzed 100,000 studies. Here in this Substack post, as I have in previous Substack posts, I will establish for readers how many studies the NRP analyzed, and as we dig deeper, I’m encouraging everyone to keep in mind that the nation has been sold a bill of goods. In every sector the public has been taken advantage of through the promulgation of the NRP 100,000 study lie.
Here’s the backstory. On January 23, 2001, two days after his inauguration, President Bush sent to Congress a “blueprint,” that he states in the foreword to the document, “represents part of my agenda for education reform.”
In blue ink and in a bold font, the header at the beginning of the blueprint states, “Transforming the Federal Role in Education So That No Child is Left Behind.” In the subsection “Improving Literacy by Putting Reading First,” the blueprint contains the following statement:
The findings of years of scientific research on reading are now available, and application of this research to the classroom is now possible for all schools in America. The National Reading Panel issued a report in April 2000 after reviewing 100,000 studies on how students learn to read.
“You know, people are going to say, well, that sounds good,” Bush said in 2004 about his education initiatives. “How do you know it works? And, as you know, I’m a how-do-you-know it works kind of guy,” Bush continues by answering his own question. “This is based on science is what I’m telling you.”
The blueprint Bush sent to Congress in January 2001, stated, “The Reading First initiative builds upon these findings by investing in scientifically-based reading instruction.” It is a great deception. The National Reading Panel phonics meta-analysis was not completed until February 2000, and the National Reading Panel report was submitted to Congress just two months later in April 2000. There are several accounts that the phonics meta-analysis was so late it was not done by a Panel member or a Panel subgroup, and that there was not time to edit it before it was submitted to Congress.
The lie that the National Reading Panel reviewed 100,000 studies was further memorialized in the Congressional Record on April 13, 2000, when the National Reading Panel completed its work and submitted its report, “Teaching Children to Read,” to Congress. Members of the National Reading Panel have subsequently denounced the 100,000 studies lie, while at the same time reviewing and endorsing government publications that include it.
Here are the actual numbers. The NRP identified 1,962 P/A studies and 1,373 Phonics studies. Key words and phrases in the abstracts reduced the number of P/A studies to 52 and Phonics studies to 38. The entire P/A and Phonics meta-analyses was reduced to 90 studies.
It gets worse. Between 2005 and 2017 the Institute of Education Sciences – What Works Clearinghouse (IES-WWC) reviewed and rejected 14 P/A studies and 11 phonics studies, primarily for non-equivalent treatment and control groups of children, but also for other failings. This means there were only 38 viable P/A studies and 27 viable phonics studies for the NRP‘s meta-analyses that were used to reshape reading instruction in public schools.
Based on these findings the decision was made to review the remaining NRP studies which had not been reviewed by the IES-WWC. This analysis identified 12 additional P/A studies and 16 additional phonics studies with non-equivalent groups. If these studies were also eliminated all that would be left are 26 viable P/A studies and 11 viable phonics studies for the meta-analyses and which to shift the nation. These findings raise concern about all the effect-sizes in which these studies were included.
In the P/A results, out of 163 mean effect sizes reported in Appendix C, Tables 2, 3 and 4, only 42 were homogenous (27%), and 121 were heterogenous (73%).
In the Phonics results, out of 66 mean effect sizes reported Appendix C, Table 3, 31 reported effects sizes were homogenous (47%), and 35 were heterogenous (53%).
The negative impact of the finding that many of the P/A and Phonics effect sizes were confounded by the heterogeneity of the NRP selected studies is immense. The findings of the NRP report, which as Shanahan stated, are the cornerstone of federal and state reading laws are crumbling. The idea that the P/A and Phonics meta-analyses provide the “scientifically based reading research,” on which federal and state laws are based is eroding. Stone by stone the forensic analysis is dismantling the myth.
Given the enormity of the consequences of the NRP P/A and Phonics meta-analyses not being generalizable, it is important that we unpack the meaning of “heterogeneity” and make sure we have a clear understanding of what is meant by “effect size. I am backing into this analysis because it is easier to figure out mathematical characteristics of phenomena if we have a real-life reason to do so. Here the real-life reason is that the public has been deceived about “evidence-based reading instruction” by the almost universal legitimization of the flawed and false findings of the NRP report which are accepted by almost all sectors of U.S. society.
Understanding How the NRP Used Effect-Sizes in the P/A and Phonics Meta-Analyses Have been Used to Deceive the Public Can be Accomplished by Unpacking the Panel’s Use of Effect Sizes
In an upcoming post I have written that numbers like words tell stories that are just as compelling as their alphabetic counterparts. Even if you “hate math” and have never taken statistics, my remit is to make the stories the numbers tell accessible. The forensic analysis has found conceptual mathematical and statistical flaws as well as basic errors in numeracy and I am going to do my best to make the NRP P/A and Phonics meta-analysis less opaque so you can consider the veracity of their findings.
So, let’s get started. The key to understanding how the NRP P/A and Phonics meta-analyses were used to shape the thinking of the nation about how children should be taught to read in U.S. public schools requires of us that we figure out how the Panel used effect-sizes in their meta-analyses, especially when the studies that they selected to analyze did not use them.
My partner has collaborated with me on quantitative analysis. His research is in the physical sciences, and we have undertaken many similar projects, Fukushima, Bhopal, BP Deep Water Oil Spill, and most recently the carbon dioxide (𝐶𝑂2) pipeline rupture near Satartia on February 22, 2020, which is the basis of the monograph that I wrote entitled The Carbon Clock which was reviewed by scientists associated with the IPCC. The one most relevant to the mathematical and statistical analysis of the NRP P/A and Phonics Meta-analyses is the forensic analysis of the statistical evidence produced by the Houston Reading Study. That statistical analysis is included in Beginning to Read and the Spin Doctors of Science, which is now open access in a digital form on my website.
There Is “High Heterogeneity” in Tables 1, 2 And 3 In the P/A Results, and More than 50% Heterogeneity in the Phonics Table 3 Results.
Backing up, in meta-analyses it is crucial to assess heterogeneity, because it provides important information about the validity of the effect sizes which have been used as evidence in federal and state laws about how children should be taught to read in publics schools. There are three key points to keep in mind:
First, heterogeneity describes the presence of variation or diversity within a dataset or across a series of studies. As stated above, it is important to remember that in meta-analyses it is crucial to assess heterogeneity because it provides important information about how the calculated effect size can vary.
Second, heterogeneity indicates that the data is not uniform, which can affect the validity of statistical assumptions. For instance, in a meta-analysis, a significant amount of heterogeneity means the studies are more different than expected from chance alone.
Third, heterogeneity impacts effect size calculations by reducing the meaningfulness of an average. If heterogeneity is significant, it means the true effect size varies across studies due to differences in the populations, interventions, or methodologies. This is the case with the NRP’s meta-analyses.
The forensic analysis found that there were deep flaws in the Panel’s assumptions about how effect sizes can be calculated. There were also many mathematical errors, data missing, and multiple mis-calculations, leading to the inevitable conclusion that the findings are not supported by the data.
In the upcoming posts you will be able to observe for yourselves these aspects of the NRP’s meta-analyses. Part of my remit is to place you as close to the data as possible.
We can sum-up the shared information of heterogeneity by stating that averaging disparate results with high heterogeneity across multiple studies complicates interpretation, potentially producing a misleading single average effect size that poorly represents any specific context. It is important to keep this understanding of the problems with heterogeneity in mind when considering the use of the “findings” of the NRP P/A meta-analysis as the “evidence-base” on which federal and state laws have been passed.
Categorically, within meta-analysis high heterogeneity among study results complicates the generation of a single, meaningful average. The problem of heterogeneity becomes actualized in the NRP report when the Panel provides evidence that only 8 of the reported P/A effects sizes were homogenous (16%), and 42 were heterogenous (84%). The NRP states:
Studies varied in many respects as indicated in Table 1 (Appendix B). The Panel examined whether these moderator variables enhanced or limited the effectiveness of P/A training for teaching P/A and for facilitating transfer to reading and spelling. It is important to recognize the limitations of this type of analysis and the tentative nature of any conclusions that are drawn.
Findings involving the impact of moderator variables on effect sizes cannot support strong claims about causality. Moderator findings are no more than correlational. The biggest source of uncertainty is whether there is a hidden variable that is confounded with the variable in focus and is the true cause of the difference; thus, the conclusions drawn should be regarded as tentative and suggestive rather than the final word (pp. 2-19 -- 2-20, italics added).
It would be reasonable to state “there is nothing to see here.” There is too much uncertainty, and there are hidden variables. The analysis is too arbitrary for any conclusive statements to made about the population of students who participated in the original study and absolutely no conclusions can be made about teaching the millions of children in America’s public schools to read.
In the Phonics results, Table 3, 31 reported effects sizes were “homogenous” (47%), 35 were heterogenous (53%). Once again, quoting the report:
Studies in the database varied in several respects that were coded and analyzed as moderator variables. Of interest was whether these moderator variables enhanced or limited the effectiveness of systematic phonics instruction on growth in reading.
It is important to recognize the limitations of this type of analysis and the tentative nature of any conclusions that are drawn. Findings involving the impact of moderator variables on effect sizes cannot support strong claims about moderators being the cause of the difference. Moderator findings are no more than correlational. The biggest source of uncertainty is whether there is a hidden variable that is confounded with the moderator and is the true cause of the difference ( p.2-113, italics added).
Despite these shared reservations clearly expressed in both the P/A and Phonics sections of the report, the Panel goes on to describe the individual “mean effect sizes” of the wide range of “moderator variables,” i.e. “study characteristics,” in definitive terms, as though all the individual effect sizes reported are actually reliable findings, despite the wide range of other study characteristics that are “averaged out” in calculating “mean effect sizes.”
The effect sizes in the NRP P/A and Phonics meta-analyses should not have been generalized, but that is what has happened. Claims of the effectiveness of P/A and phonics training are described as “settled science” in federal and state laws, by the publishers of reading programs, and by the media. Ignominiously, the claims also continue to be legitimized by reading researchers – mostly by eminent scholars who have contracts with reading program publishers.
The statement “The biggest source of uncertainty is whether there is a hidden variable that is confounded with the variable in focus and is the true cause of the difference” overlooks – or just fails to mention – that many of the variables which are a source of uncertainty are already a part of the NRP data base and are included in Appendix F and Appendix G data sets.
Ignoring The Moderator Variables Of The NRP Effect Sizes Is A National Problem
We are ready for a deep dive into effect sizes. In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population.
The kicker is there are strict conditions under which effect sizes can be used and can’t be used legitimately. In the next ten posts I will show definitely that the data used in the NRP P/A and Phonics meta-analyses did not meet the criteria for its use in calculating effect sizes, and therefor the meta-analyses have no scientific legitimacy.
Remember, the averaging of multiple disparate results (“effect sizes”) from multiple studies with a wide variety of characteristics result in a series of “mean effect sizes” which are misleading.
As we move forward pay attention to the size of the effect size. Generally, 0.5 means a medium effect size, indicating a visible but not overwhelmingly large difference between two groups or a moderate strength in a relationship between variables. Here’s the Cohen Rule of thumb:
Small (0.2): A very small, barely noticeable difference.
Medium (0.5): A visible and moderate difference.
Large (0.8): A very large difference, difficult to miss.
Of course, these effect sizes are only indicators of the strength of relationships if the variables are homogeneous. Here are some of the primary challenges:
Misleading Average: The calculated average effect size may not apply to any specific population or intervention scenario, as the true effect likely varies widely.
The calculated effect sizes reported by the NRP are derived from comparisons of relatively small numbers of children, and the results cannot be extrapolated to even a similar size group of children elsewhere with different characteristics, let alone to larger size groups or all the children in in the U.S. attending public schools.
Reduced Generalizability: High variability limits the ability to generalize the findings to a broader population or to different settings.
In any specific NRP “mean effect size” reported (Tables 2, 3, 4 for P/A; Table 3 for Phonics), there is a “high variability” in all of the other characteristics of the reported data. In upcoming Substack posts I will present analysis tables from the Phonics data that provide a snapshot of the “high variability” in “grade/age,” “reading ability,” and “length of training.”
Masked Subgroup Effects: In the NRP report multiple single averages findings obscure important differences, such as an intervention being highly effective for one group but ineffective or harmful for another.
Building on the explanation for “reduced generalizability,” in the upcoming tables you will be able to see for yourselves an example of this phenomenon. In the analysis tables from the Phonics data for “grade/age” and “reading ability.” The NRP calculated an effect size for “grade/age” and “reading ability” for “K +1st vs 2nd to 6th”which show a “single average” of 0.55 for the narrow sample of K+1st vs 0.27 for the widely variable mixed ages/grades reading abilities for 2nd to 6th.
The NRP did not undertake additional analyses, and the presence of high heterogeneity indicates a lack of a single, consistent “true” effect across all studies.
Summary of the Findings of the Forensic Analysis of the NRP P/A and Phonics Meta-Analyses
In upcoming Substack posts I will systematically take you through the forensic analysis of every P/A and Phonics Category. Here is a brief overview. The first F/A finding has been stated but after the review of heterogeneity and effect sizes above it is worth repeating.
The Heterogeneity of the NRP Effect Size Results: The averaging of multiple disparate results (“effect sizes”) from multiple studies with a wide variety of characteristics results in a series of “mean effect sizes” which are misleading and cannot be generalized to other populations.
The NRP Mean Effect Sizes: Most of the “mean effect sizes” reported in the P/A and especially the Phonics Tables of Results are heavily weighted by scores on tests of isolated word skills, which comprise 78% of the 169 phonics reported individual effect sizes, distorting all the report results for mean effect sizes.
The NRP Selected Studies that Focus on Word Identification Skills that Can Be Taught and Tested: The report is weighted towards pseudoword and word skills and not comprehension and reading. Overall, there were 66 mean effect sizes (from 38 studies) that were calculated from 169 individual effect sizes:
1. 47 for word identification/miscellaneous words
2. 26 for decoding words,
3. 28 for decoding pseudowords,
4. 30 for spelling individual words,
5. 26 for comprehension,
6. 12 for oral reading.
The 47+26+28+30 = 131 word skill effect sizes that represent 78% of the individual effect sizes, and the 26+12 = 38 reading skills represent only 22% of the total mean effect sizes reported by the NRP. The takeaway – the findings of the NRP are distorted by the weighting of isolated word skills.
Reliable Data Results from Some Studies are Misrepresented in the NRP Report: There is evidence that some study results were arbitrarily changed for no apparent valid reason producing misleading effect sizes. An example of this manipulation of study results is provided by the studies on Reading Recovery® by Iverson and Tunmer in the P/A meta-analysis, and Tunmer and Hoover in the Phonics meta-analysis. Large sections of the two research articles are identical as is the reported data, leading to the conclusion that a single study was reported in two different journals, impacting the NRP effect sizes in both the P/A and Phonics meta-analyses. Still more concerning, in the case of the Tunmer and Hoover study in the Phonics meta-analysis, the NRP changed the effect sizes and misrepresented the findings, attributing the positive results to phonics when they should have been attributed to Reading Recovery.® A Substack post that focuses on this case study will be published in the coming weeks.
Before We Dive Into The Data Let’s Revisit Some Of The Important Points To Keep In Mind
I am aware that I am expecting a lot from readers. The key is to read the numbers in the text in the same way you read words and remember that you do not have to remember them. What is important is that you get the overall message – there were not 100,000 studies, just 52 P/A and 38 Phonics studies, and a significant number of those studies were later disqualified by the IES-WWC. From your point of view, it is important to get the gist.
There are vast differences in individual study characteristics in the data provided in the NRP report for the 52 P/A studies (96 study comparisons) and for the 38 Phonics studies (79 study comparisons), which are the basis for the two sets of meta-analysis results.
1. The WWC disqualified 13 of the 57 phonemic awareness experimental studies, and 11 of the 38 phonics studies confirming that many of the studies the Panel selected did not meet the criteria that they had established. Only 71 acceptable studies included in the NRP’s two meta-analyses met their criteria.
2. Not all the 96 P/A and 79 Phonics study comparisons that were available were used in calculating the reported results. Some comparisons seem arbitrary and others inexplicable.
3. There are also major differences in the ways in which the meta-analysis results were reported in the P/A and Phonics categories.
Now for the mechanics of the NRP P/A and Phonics meta-analyses. Here are the differences between the organization, the structure, and the execution of the two meta-analyses.
For the P/A studies, up to 3 effect sizes were reported for each of the 96 study comparisons in the Appendix F data.
1. Phonemic Awareness (72 effect sizes)
2. Reading (96 effect sizes)
3. Spelling (50 effect sizes)
The results for these 3 effect sizes were kept separate in the reported data in the Appendix F. data. They were also reported separately in all the meta-analysis results, which are presented in 3 separate Tables 2, 3, and 4 for P/A, Reading, and Spelling in Appendix C.
In contrast, for the Phonics study comparisons, up to 7 effect sizes were reported in the Appendix G data for 79 comparisons:
1. Word Identification (59 effect sizes)
2. Word Decoding (30 effect sizes)
3. Pseudoword Decoding (39 effect sizes)
4. Spelling (37 effect sizes)
5. Comprehension (35 effect sizes)
6. Oral Reading (16 effect sizes)
7. General Reading (19 effect sizes)
Then, for each study comparison, a “mean effect size” was calculated across whichever of the 7 measures had been assessed in that study comparison.
The NRP states that “this yielded an overall outcome measure for each comparison” (p. 2-110), despite this “overall outcome” being the meaningless average of up to 7 quite different measures.
Again, in contrast to the P/A studies, it is important to note that the Phonics meta-analysis results were presented in a single table – Table 3. Out of 15 separate sets of these meta-analysis results, only 3 used the first 6 of the 7 separate “mean effect sizes” from the Appendix G data listed above.
The other 12 sets of results in Table 3 used the “overall outcome” mean effect sizes described above, rendering these 12 sets of results of little significance, despite the NRP fixation on “significance” in all its results.
A Forensic Case Study of the NRP Comparison of the Effectiveness of Phonics Instruction for Kindergarten and First Grade Students Compared with the Effectiveness for Second Through Six Grade Students
We’ve reached the loadstone of the whole enterprise, by that i mean the findings of the NRP P/A and Phonics meta-analyses that became a political magnet – an idea so convincing that people in every sector of U.S. society have been taken in. Here is the quote from the NRP Report:
Phonics instruction taught early proved much more effective than phonics instruction introduced after first grade. Mean effect sizes were kindergarten d = 0.56; first grade d = 0.54; 2nd through 6th grades d = 0.27. The conclusion drawn is that phonics instruction produces the biggest impact on growth in reading when it begins in kindergarten or 1st grade before children have learned to read independently. These results indicate clearly that systematic phonics instruction in kindergarten and 1st grade is highly beneficial and that children at these developmental levels are quite capable of learning phonemic and phonics concepts. To be effective, systematic phonics instruction introduced in kindergarten must be appropriately designed for learners and must begin with foundational knowledge involving letters and phonemic awareness (p. 2-93).
In previous Substack posts I have focused on how the Right has used the idea derived from the findings of the NRP meta-analyses that to be effective phonics must be systematically taught. Basically, I have shown how the idea that phonics must be explicitly taught became a loadstone is ready to alter the nation’s point of view.
It is important to note that I have also presented evidence-based accounts of how phonemic awareness and the development of sound-symbol relationships begins early in meaningful contexts young children’s everyday lives. I have also drawn attention to the fact that more than 60% of children in the U.S. have had adverse childhood experiences, and that for many children communicating through their writing is vital to their healing. Even when children are pre-alphabetic and even when they are just beginning to learn the relationships between sounds and symbols, children write to communicate how they are making sense of their world.
What follows is a forensic case study of the NRP comparison of the effectiveness of phonics instruction for kindergarten and first grade students compared with the effectiveness for second through six grade students.
The NRP’s comparison of the average for the 2 groups of students showed that the average for the kindergarten and first grade group was higher than the average for the second through sixth grade group. Here is the conclusion of the NRP:
To analyze the impact of age and grade combined, two groups of children were distinguished: the younger children in kindergarten and 1st grade; and the older students in 2nd through 6th grades. The latter group included the mixed age/grade comparisons involving reading disabled (RD) children and low achieving readers. The outcome variable was the effect sizes on the immediate posttest given either at the end of training or at the end of the first year of the program, whichever came first (2-114).
The last sentence in the quote seems straight-forward but it is not. In the next Substack on the forensic analysis, I will include a table on the “Time of Training.” Here it is important that you are aware that time of training ranged from 5 hours over 6 weeks to 18 weeks over 54 sessions, and with the outer limit being 4 years.
It’s time to introduce the first of the forensic tables. Table One focuses on Grade-Age and ‘Reading Ability’ for K-1st vs 2nd-6th Grades for 62 comparisons in the Phonics Meta-Analysis. Parenthetically, the NRP used only 62 comparisons in this analysis, not the 66 comparisons often cited.
Note that the K-1st grade comparisons identified the children in the studies as “normal” and “at risk” readers. In the 2nd-6th mixed grade comparisons the majority of the children in the studies are identified in the Appendix E data as “Low Achievement” or “Reading Disabled.”
Before any discussion look at Table Two which focuses on the numbers of children in the phonics study comparisons presented above, it is important that you know that only 697 (37%) children were identified as “normal readers” in the mixed 2nd to 6th grade studies while 1,165 children were identified as “low achievers” or “reading disabled.”
By juxtaposing the effect sizes in the 62 comparisons with the number of children in the 62 comparisons the NRP conclusions fall apart. Here’s the NRP:
Phonics instruction taught early proved much more effective than phonics instruction introduced after first grade. Mean effect sizes were kindergarten d = 0.56; first grade d = 0.54; 2nd through 6th grades d = 0.27. The conclusion drawn is that phonics instruction produces the biggest impact on growth in reading when it begins in kindergarten or 1st grade before children have learned to read independently.
The NRP compared: normal and at-risk readers in K-1st grade children with 2nd -6th grade children who are mostly identified as low achievement and reading disabled. In addition, there are 1,250 more children in the kindergarten and first grade studies than in the second through sixth grade mixed studies. Incredible as it might seem this is the “scientific evidence” on which federal and state laws have been based.
Let’s take the forensic analysis further. There are two more tables I would like you to consider. It’s the same set up. Table Three focuses on the Phonics “Grade-Age” and “Reading Ability” for K-1st vs 2nd-6th Grades for the 62 comparisons in these Phonics Meta-Analysis, and Table Four focuses on the numbers of children in the phonics study comparisons.
I am quoting the NRP yet again:
These results indicate clearly that systematic phonics instruction in kindergarten and 1st grade is highly beneficial and that children at these developmental levels are quite capable of learning phonemic and phonics concepts. To be effective, systematic phonics instruction introduced in kindergarten must be appropriately designed for learners and must begin with foundational knowledge involving letters and phonemic awareness (p. 2-93).
On what planet is this a scientific fact? Certainly not the one we inhabit. Of course, phonics is important, but the NRP does not provide scientific evidence that phonics should be explicitly taught in exercises that reduce sound-symbol relationships that are devoid of meaning.
The presence of high heterogeneity in the NRP P/A and Phonics meta-analyses, and the reliance on misleading averages that increase uncertainty and mask subgroup effects, undermines the legitimacy of the NRP report and the federal and state laws that are based upon it.
Categorically, the NRP P/A and Phonics meta-analyses have no scientific validity and yet in the Phonics meta-analysis the NRP reports a mean effect size for kindergarten of d = 0.56, for first grade d = 0.54, and for 2nd through 6th grades the effect size is reported to be d = 0.27. An overall effect size of d=0.41 for “reading,” that was derived by “averaging” 65 of the 66 effect sizes, is supposed to be the “single effect across all studies.” It’s a false premise, but these numbers are frequently quoted. The misinformation is ubiquitous, the gaslighting persists, and policy makers, the media, and the public continue to be taken in.
On the Phonics.org website, under the heading, “The Original National Reading Panel: A Foundation That Endures,” is the following statement:
The National Reading Panel (NRP), convened by Congress in 1997 and reporting in 2000, remains one of the most significant contributions to reading research in education history. This comprehensive meta-analysis examined over 100,000 reading studies and concluded that the most effective reading instruction includes a combination of methods: phonemic awareness, systematic phonics instruction, guided oral reading for fluency, vocabulary development, and reading comprehension strategies.
The Panel’s findings were unequivocal about phonics instruction. After analyzing 38 high-quality studies involving 66 treatment-control comparisons, researchers found that systematic phonics instruction enhances children’s success in learning to read significantly more than instruction that teaches little or no phonics. The effect size was moderate (d = 0.41), with larger effects when instruction began early (d = 0.55 in kindergarten versus d = 0.27 after first grade).
These findings have shaped literacy policy across the United States for over two decades, influencing everything from state reading legislation to classroom practices.
There is so much misinformation in this Phonics.org statement. The nation cannot go on believing the false finding of the NRP report. The work you did reading this Substack post, unravelling the NRP’s unorthodox use of effect sizes and gaining an understanding of the problems of heterogeneity, are an act of resistance to the lies and false information that the public has been fed and still is being fed. But knowing is not enough. Rejecting the propaganda about how children should be taught to read leaves a gaping hole. Who can you believe? What can you believe? How do you make sure what you know is based on science and not political propaganda that is masquerading as science?
My recommendation is that you turn to the 2025 European Commission joint NESET-EENEE Analytic Report, Effective practices for literacy teaching. The report is based on a detailed literature review of the most recent European and international research on effective approaches to the teaching of literacy.
Ignore the unfortunate insertion of NRP report in the European Commission’s analytic report and focus on the key findings:
1. There is general agreement across Europe that initial literacy instruction should use a balanced approach
2. Reading for meaning and understanding should not be taught separately from instruction about grapheme–phoneme relationships;
3. Learning to read and to write should be parallel and interactive activities
4. It is critically important that children are encouraged to enjoy reading
5. Teachers need to read stories with their students and to offer a positive model of reading as an activity in which everyone can participate and be successful (p. 28).
Referencing Wyse and Hacking (2024), the European Commission report emphasizes the “demonstrably greater gain in a number of aspects of literacy development for children who were taught reading and writing together” (p.28).
Some readers will know that the concept of Family Literacy originates in my doctoral research, and so it was affirming to read that the European Commission states that “in many countries, Family Literacy Programmes and Book-start initiatives aimed at supporting families and children have shown positive effects on children’s later literacy performance.”
Also, that “in a pan-European analysis, book ownership, as part of a broader literacy environment, was also found to be associated with language and literacy development and later reading attainment.”
We will end here with a final note. In a webinar on Effective teaching strategies for developing literacy that took place on Thursday November 27th, Colin Harrison, the lead author on the European Commission’s Analytic report, summarized the findings. The contrast between the European stance on children’s early literacy development and the U.S. stance could not have been starker.
Harrison stated that “literacy development is greater in primary schools with a rich literacy environment, including libraries, multicultural resources, and parent groups.”
He stated, “All teachers know that language development is important, but a key emphasis in the report is the vital role of talk in relation to reading.” Harrison stated that it is through talk “both before school, and at school that vocabulary is developed.”
“Once children get to school, talk, as well as reading, improves comprehension,” Harrison continued, “Expressive activities- songs, acting out, storytelling- are all
Important.” He added, “Literacy development is greater in primary and secondary schools with a rich literacy environment.”
“At both primary and secondary level, and in Vocational Education and Adult Education,” Harrison continued, “ interaction with printed texts leads to enhanced learning in every subject, and reciprocal teaching, using reading in small groups, is recommended for dealing with complex texts.”
Concluding Comments and Recommendations
I will continue to deconstruct the last 25 years of the aberrant reshaping of reading instruction in U.S. public schools, while at the same time I will hold on to and share with you the possibilities of creating classrooms in which the health and well-being of children is as important to us as ensuring that they become resilient and resourceful readers and writers.
The forensic analysis of the NRP report will provide readers with an opportunity to gain broad understandings from the microscopic descriptions of the many mathematical errors, data missing, and multiple mis-calculations, leading to the inevitable conclusion that the NRP’s findings are not supported by the Panel’s own reported data.
The NRP P/A and Phonics meta-analyses have no scientific validity, as the Panel states in the original report. They write that they had reservations about the findings in both the P/A and Phonics sections of the report. The Panel warns of the “tentative nature of any conclusions drawn” and they advise that the “findings involving the effects of moderator variables on effect sizes cannot support strong claims about causality” and should not be widely disseminated.
The forensic analysis supports the Panel’s reservations that received no attention from NICHD or from Bush and his advisors when he presented his Blueprint for NCLB and Reading First to Congress or when he inserted the 100,000 study lie in the Congressional Record. But it is not too late. Serious consideration should be given by NICHD for the withdrawal of the report. NICHD should also consider the advisement of all official bodies, especially the 40 plus U.S. States that used the NRP report as the basis for “evidence-based reading instruction” “science of reading” laws.
NICHD should also consider retracting statements made on the Internet that provide the public with false information about the NRP report. Retrieved on December 6, 2025, here are some of the NICHD endorsements posted in multiple URLs to the NICHD website:
The NRP report is a consensus document based on the best judgments of a diverse group of experts in reading research and reading instruction.
This Report is organized into sections to provide an overview of the major findings and determinations achieved by the NRP in the areas of alphabetics.
The work of the NRP builds on existing knowledge about what types of skills children need to acquire to become independent readers.
All the studies report positive results, suggesting that it is possible to use computer technology for reading instruction.
The last statement is demonstrably false. In the NRP report, the data in the Phonics section does not report the use of a “Computer” as the type of trainer in any of the 38 studies. In the P/A section, the NRP state:
“Computer” as the type of trainer is reported for only 7 out of the 52 studies.
it is apparent that computers produced a moderately strong effect size on the acquisition of PA (d = 0.66) although it was significantly less than the effect size for other forms of instruction (d = 0.89). The phonemic awareness that children learned from computers transferred and improved their reading performance on the immediate posttest (d = 0.33), but computers did not improve reading as much as other forms of PA instruction (d = 0.55) (p. 2-23)
The fabrication of what we know is insidious and occurring at all levels.
The last word goes to Shanahan who has made a career of promoting the NRP report. Posted on APA PychNet is an article Shanahan published in 2004 that sums-up the position he has maintained for 25 years since the NRP report was published:
The National Reading Panel report (National Institute of Child Health and Human Development [NICHD], 2000) was a remarkable development in the application of research to practice in reading education. …
Despite the storm of complaints, there appears to be no reason not to apply the panel’s findings to classroom instruction and many reasons for making sure that these findings guide the instruction we give our children.
It’s the official story, the one that Ehri et al (2001 July, 2001 September), Foorman et al (2000), Fletcher et al (2020), Lyon (2002), Moats (2007, 2020), Stanovich and Stanovich (2003), Tierney and Pearson (2024), and other reading researchers and cognitive scientists have long maintained.
The forensic analysis of the original data analyzed by the National Reading Panel in their Phonemic Awareness and Phonics meta-analyses does not support the position of these renowned scholars or the position of many others in multiple sectors of U. S. society.
In the next Substack post I will focus on the origins of the 100,000 study myth that is so intricately linked with the findings of the NRP report. How the 100,000 study myth became embedded in the American belief system about phonics and how children should be taught to read is filled with insights about the role of public and private sectors and the media, supported by their academic collaborators have reshaped reading instruction in U.S. public school.
A Note about Teaching in Dangerous Times on Substack
For those of you who have made it through this post, thank you. My hope is that I can continue to post most of my articles for everyone to read. However, I am aware that Substack is an incredible platform for scholars like me to reach far more people than I would otherwise. So, a request, if you have the means and can help support the site, I would be grateful. If just a few readers subscribe it will mean, we are helping to pay for the incredible service Substack provides. I appreciated you support.
References
Directorate-General for Education, Youth, Sport and Culture: Network of Experts on the Social Dimension of Education and Training (NESET) and the European Expert Network on Economics of Education (EENEE). “Effective practices for literacy teaching.” EENEE-NESET report April 9, 2025, Publications Office of the European Union, Luxembourg. Authors: Harrison, C., Brooks, G., Pearson, P. D., Sulkunen, S., Valtin, R. Downloaded from: https://education-socioeconomic-experts.ec.europa.eu/publications/analytical-reports/effective-practices-literacy-teaching_en or https://data.europa.eu/doi/10.2766/485436
Ehri, L. C., Nunes, S. R., and Shanahan, T. (2001, July). Phonemic awareness instruction helps children learn to read: evidence from the National Reading Panel’s meta-analysis. Reading Research Quarterly, 36(3):250-287. Downloaded from: https://doi.org/10.1598/RRQ.36.3.2
Ehri, L. C., Nunes, S. R., Stahl, S. A., and Willows, D. M. (2001, September 1). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of Educational Research, Fall 2001, Vol. 71, No. 3, pp. 393–447. Downloaded from: https://journals.sagepub.com/toc/rera/71/3
Fletcher, J. M., Savage, R., and Vaughn, S. (2020, November 5). A commentary on Bowers (2020) and the role of phonics instruction in reading. Educational Psychology Review, Vol. 33, pp. 1249-1274. Downloaded from: https://link.springer.com/article/10.1007/s10648-020-09580-8
Foorman, B. R., Fletcher, J. M., Francis, D. J., and Schatschneider, C. (2000, August-September). Response: Misrepresentation of Research by Other Researchers. Educational Researcher, American Educational Research Association, Vol. 29, No.6, pp. 27-37. Downloaded from: https://www.jstor.org/stable/1176806
Lyon, G. R. (2002). Testimonies to Congress 1997-2002 _ ERIC Number ED475205. https://eric.ed.gov/?id=ED475205
Moats, L. (2007). Whole-Language high jinks - How to tell when “scientifically-based reading instruction” isn’t. (Foreword by Finn, C. E. Jr., and Davis, M. A. Jr.). Thomas B. Fordham Institute. Downloaded from: https://files.eric.ed.gov/fulltext/ED498005.pdf
Moats, L. (2020), Teaching Reading is Rocket Science 2020: What Expert Teachers of Reading Should Know and Be Able to Do. American Federation of Teachers. Downloaded from: https://www.aft.org/sites/default/files/moats.pdf
National Institutes of Health (NIH), National Institute of Child Health and Human Development (NICHD). (2000, April). Report of the National Reading Panel: Teaching Children to Read. Retrieved from: https://www.nichd.nih.gov/publications/pubs/nrp/report And https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf
Phonics.org. (2025, September 22). The 2025 National Reading Panel Update: What’s Changed in Phonics Research? Downloaded from: https://www.phonics.org/the-2025-national-reading-panel-update-whats-changed-in-phonics-research
Shanahan, T. (2003, April). Research-based reading instruction: Myths about the National Reading Panel report. The Reading Teacher, 56(7), 646-655. Downloaded from: https://www.jstor.org/stable/20205261
Shanahan, T. (2004). Critiques of the National Reading Panel Report: Their implications for research, policy, and practice. In P. McCardle & V. Chhabra (Eds.), The voice of evidence in reading research (pp. 235–265). Baltimore, MD: Paul H. Brookes Publishing Co. Abstract downloaded from: https://psycnet.apa.org/record/2005-06977-011
Shanahan, T. (2005, January). The National Reading Panel Report: Practical advice for teachers. Learning Point Associates. Downloaded from:https://files.eric.ed.gov/fulltext/ED489535.pdf or https://www.researchgate.net/publication/234692266
Shanahan, T. (2017, August 13). Can I still rely on the National Reading Panel Report? Downloaded from: https://www.shanahanonliteracy.com/blog/can-i-still-rely-on-the-national-reading-panel-report
Shanahan, T. (2017, August 22). Can I still rely on the National Reading Panel Report? Shanahan on Literacy: Blogs About Reading. Downloaded from: https://www.readingrockets.org/blogs/shanahan-on-literacy/can-i-still-rely-national-reading-panel-report
Stanovich, P. J. and Stanovich, K. E. (2003, May). Using research and reason ineducation: How teachers can use scientifically based research to make curricular and instructional decisions. The Partnership for Reading (NIL, NICHD, U.S. DoE, U.S. DHHS). Downloaded from: https://lincs.ed.gov/publications/pdf/Stanovich_Color.pdf
Taylor, D. (1998). Beginning to read and the spin doctors of science. Urbana, IL: National Council of Teachers of English.
Tierney, R. J. and Pearson, P. D. (2024, April 20). Fact-checking the science of reading: Opening up the conversation. Literacy Research Commons, Foundation for Learning and Literacy. Downloaded from: https://literacyresearchcommons.org/wp-content/uploads/2024/04/Fact-checking-the-SoR.pdf
The White House Archives. (2001, January 23). Transforming the Federal Role in Education So That No Child is Left Behind - Improving Literacy by Putting Reading First.” Downloaded from: https://georgewbush-whitehouse.archives.gov/news/reports/no-child-left-behind.html#3 Transmitted by a “Letter from the President to the Speaker of the House of Representatives and the President of the Senate”, January 23, 2001, downloaded from: https://georgewbush-whitehouse.archives.gov/news/releases/2001/01/text/20010123.html
United States Senate Subcommittee On Labor, Health And Human Services, And Education, And Related Agencies, Committee On Appropriations. (2000, April 13). Report of the National Reading Panel. Downloaded from: https://www.govinfo.gov/content/pkg/CHRG-106shrg66481/html/CHRG-106shrg66481.htm




