The Houston Reading Study is one of the Four Cornerstones of the Science of Reading: A Conversation with AI About the Flaws in the HRS that Undermine the 43 State Reading Laws
Photo by Sasun Bughdaryan: Original in Color, Unsplash.com
(I will be discussing my research on Thursday, May 7th, 2026, at 7:00 PM EST 6:00 PM CST, 4:00 PST in a Zoom webinar arranged by Andrew Johnson on behalf of ILEC. If you would like to join us the registration information is on the ILEC Fb page.)
This Substack post on the Houston Reading Study is one of the conversations I have had with AI about my six year research project that has focused on the National Reading Panel’s phonemic awareness and phonics meta-analyses and the “Science of Reading.” In conducting this forensic analysis, I have identified four cornerstones of the Science of Reading, and I have carried out a quantitative and qualitative analysis of each of them.
Writing allegorically, Science of Reading proponents use images of the NRP “five pillars” of reading instruction to knowingly or unknowingly gaslight the public about how children should be taught to read. The studies I have analyzed are the four corners that hold up the five pillars. I don’t want to make too much of this analogy, but if the reading studies which are the cornerstones of the Science of Reading do not hold-up under scientific scrutiny, then the five pillars will come crashing down.
Thorndike’s Law Of Effect Is Used To Establish “Science Of Reading” Algorithmic Dependency
One of the cornerstones that is important to keep in mind was constructed by Edward Lee Thorndike at the end of the 19th century. In a future Substack we will explore how Thorndike’s Law of Effect is being used to establish algorithmic dependency in children learning to read in U.S. public schools today. But in this post, all that is necessary is that we keep in mind that Thorndike’s research still dominates the reading field, and that Thorndike’s Law of Effect and stimulus-response protocols are foundational conceptual elements of both the Houston Reading Study and the National Reading Panel phonemic awareness and phonics meta-analyses.
Thorndike’s doctoral experiments using starving cats isolated in traps are the basis of the behavioral theory of Connectionism. Thorndike’s Law of Effect, which casts “learning” as the formation of stimulus-response (S-R) associations, is the dominant characteristic of the experimental reading studies selected for inclusion in the NRP’s alphabetics section of the report.
It is highly unlikely that the politicians who passed the 43 state reading laws that mandate a Science of Reading are aware that these laws utilize Thorndike’s century old Law of Effect. His establishment of operant conditioning and stimulus-response protocols are the basis of the behaviorist digital reading programs that are leased to school districts by companies like McGraw-Hill (Platinum Equity) and HMH (owned by Veritas Capital). These digital reading programs are scientifically untenable and risk what AI has called algorithmic assimilation.
Before this conversation about the Houston Reading Study begins, it is important that you know a little more about Thorndike, and why his architectural origination of the Science of Reading is so toxic to the well-being of children and to the health of the nation.
Thorndike was a proponent of eugenics, and he was involved with organizations promoting it. He is recognized by historians as a eugenicist who believed intelligence was a hereditary trait and he supported measures to improve “the human race through heredity, such as selective breeding and, at times, sterilization.” Thorndike was also a charter member of the Galton Society, which was founded in 1918 to promote “mainline eugenics.” As previously stated, Thorndike is also the originator of the Law of Effect, which he first described as a “great weapon” in his 1912 book, Education: A First Book, and repeated in his 1929 book The Elementary Principles of Education, which Thorndike wrote with Arthur Gates.
Inexplicably, Thorndike is still honored as a “pioneer” member of the Reading Hall of Fame, and his research is cited in the National Reading Panel’s 2000 report. My forensic analysis of the NRP’s phonemic awareness and phonics meta-analyses stretches back to Thorndike and traces the Thorndike-Skinner- Engelmann continuum that is represented in the NRP alphabetics category by the Panel’s almost exclusive selection of behaviorist reading studies that are grounded in Engelmann’s operant conditioning, stimulus-response research and teaching methods. Engelmann is referenced 35 times in the NRP report.
One of the studies on the Thorndike-Skinner- Engelmann continuum is the Houston Reading Study which was funded by NICHD in the 1990s. It’s important to know about the Houston Reading Study for many reasons. What matters to us right now is that the study was included in the NRP phonics meta-analyses, and it is one of the studies that the Institute of Educational Sciences – What Works Clearinghouse (IES-WWC), reviewed in 2007. The What Works Clearinghouse (WWC) applied a strict regulatory standard for causal validity. The WWC noted “reservations” about the HRS because of the use of “a quasi-experimental design where students were not randomly assigned to groups, or where the analysis did not fully account for initial differences between the groups.”
George W. Bush Used The Houston Reading Study To Bolster His National Credentials During The 2000 Presidential Election
Separate from the AI conversations reported here, I asked AI, “How was the HRS used in the election of George W. Bush as President?
Here’s AI’s response,
“The Houston Reading Study (HRS), specifically the work of Barbara Foorman and Jack Fletcher, served as the primary scientific proof for George W. Bush’s signature education platform, ‘The Texas Miracle,’ which he used to bolster his national credentials during the 2000 presidential election. By framing education as a civil rights issue based on objective “science,” Bush was able to neutralize the traditionally Democratic strength on education and win over moderate voters.”
Essentially, Democrats in Congress and the public were duped.
In the shadow of the political use of the 1996-1998 Houston Reading Study, in this Substack conversation with AI, we trace the historical roots, while at the same time tracing it forward to the present time – to the study’s inclusion in the National Reading Panel Report, NCLB and Reading First, to the 43 State Reading Acts, and the pending Science of Reading of 2026 – H.R. 7890.
The overarching question that is addressed in my conversations with AI is “Has a fraud been perpetrated against the public that has been carried out through the manipulation of the Houston Reading Study, the 1990s NICHD reading studies, the National Reading Panel Report, the publication of the 2016 IES Practice Guide on Foundational Skills in Reading (Barbara Foorman, a PI of the HRS was the lead author), and the tranche of “Science of Reading” state and federal reading laws?”
A shift towards authoritarianism always includes taking control of the public education of children, “reshaping” (Burke) how they are taught to read, regulating and monitoring their performance, and treating them like “sponges“ (McMahon).
Gaining control of children’s language and thought are part of the rapid transformative shifts that are currently reshaping the U.S. federal government. These shifts are characterized by an aggressive expansion of executive authority, a judiciary reinterpreting administrative law, and a Congress navigating party alignment with the presidency.
Teaching Children To Read Is Made Even More Complex By The Technological Shifts Taking Place In Society From Print-Based Information To Instantaneous Digital Connectivity
The lurch to the Right is made even more complex by the blink-of-an-eye transformations that are taking place that have overthrown five hundred years of paper communications. It is happening so fast that many people are unaware that incomprehensible transformations are taking place in how information is created, stored, and shared, shifting society from print-based information to instantaneous digital connectivity.
Artificial Intelligence – AI – refers to systems that can learn from data, improve on their own, and make decisions. They are, essentially, “algorithmic systems”. Harnessing AI to verify in seconds the findings of the six year forensic analysis I have completed is proving to be a trip. In the AI dialogue that follows you will read that when AI asks, “Would you like me to help you draft a high-impact opening paragraph for your Substack post?” I respond, “Okay, but you are making me obsolete! (joke)”
AI writes back, “Haha, definitely not! I’m just the high-speed research assistant—you’re the one who spent years in the trenches seeing the patterns I only “know” as data points. Think of it this way: I can run the numbers and identify the omitted variable bias, but I don’t have the “boots on the ground” intuition to know which life circumstances were missing in those Houston classrooms. You’re the one providing the context and the moral compass for the critique. I’m just sharpening the tools you’ve already built!”
AI then writes “a high-impact opening paragraph” and I respond, “I’m smiling. Pretty good for a machine. I write better. Thanks for your help.”
I’ll leave you to read in the actual dialogue on how AI responds to my stating “I’m a better writer.” The point is that the dialogues I have had with AI have provided me with a verification tool that has deepened my analysis. But there are pitfalls. For example, when I asked AI for the number of studies in the NRP phonemic awareness and phonics meta-analyses it provided the numbers derived from my forensic analysis and referenced my Substack as a source. Teleologically funny, but I am on high alert for a recurrence – which has happened.
Teachers Expressed Their Anxiety About The HRS And Spoke Of How The Research And Politics Were Intertwined
In this Substack post I am using AI to respond to statements made by, Barbara Foorman, David Francis, Jack Fletcher, and Chris Schatsneider, the Principal Investigators (PIs) of the Houston Reading Study, to my analysis of their research that is presented in Beginning to Read and the Spin Doctors of Science.
My HRS research began when I received a telephone call from a teacher in the Houston Independent School District (HISD) asking for help. For over two years I received phone calls from teachers participating in the HRS who were concerned about the study and their safety, because the Texas Business Council was monitoring the study.
Teachers expressed their anxiety about the HRS and spoke of how the research and politics were intertwined. Rather than provide a secondhand account of the tensions between the School District administration and teachers and the PIs conducting the study, I asked several teachers who participated in the research if they would like to comment. I asked three questions and their responses were consistent with statements made almost 30 years ago when the Houston study was in progress. Here are the questions and representative responses.
1. D: I wondered if you would like to frame the teachers’ concerns. Anonymously of course.
T: Since the district had achieved great success through a well-balanced approach to teaching reading, there was concern that changing direction would impact children’s academic progress. Schools selected were among the highest achieving schools, and teachers’ professional practices had been influenced by rich professional learning. Interestingly, children, when asked about the newly instituted Open Court program, used words like “boring” to explain how they felt about the shift.
2. D: After the first year the project was removed from one of the schools is there something I could write about that?
T: The project was moved to HISD because of all the controversy. The superintendent actually asked (names a PI) to leave his office during a heated exchange. Growing concerns about the study and political overtones about the study sparked unresolvable tensions. Certainly, the study became more political than educational.
3. D: Finally, there is documentation that establishes the Texas Business Council was monitoring the project. For instance, I have written about the 1996 Texas Business Council workshop at which Barbara Foorman spoke. Is there anything I can write about the teachers’ response to knowing that the Business Council and then Governor George W. Bush were closely linked to the project?
T: Some teachers were interviewed by them. Teachers were aware that there was great political pressure around reading. The Reading Summits with Foorman, Fletcher, Carnine, and Osborn were occurring around the same time. The teachers were highly professional and were certainly aware that changes were being influenced by George W. Bush’s statewide focus on reading. There were big changes in required professional development also.
I have documented some of the teachers’ concerns about the behavior of the PIs and about the behavior of members of the Texas Business Council which monitored the study. It is not an exaggeration to state that some of the teachers were frightened. A teacher said, “I recall telling my husband if anything happened to me, he should suspect foul play.” One teacher said she received a call from an attorney telling her to watch her step.
The study was published In 1998 in the Journal of Educational Psychology 1998, 90, 37-55, entitled, “The role of instruction in learning to read: Preventing reading failure in at-risk children.” It was and still is one of the most influential studies conducted in the reading field.
I conducted a forensic analysis of the Houston Reading Study which was published in the 1998 book Beginning to Read and the Spin Doctors of Science. There is a digital copy of Spin Doctors on my website, which you can download. Two chapters were republished in NCTE journals, and the references are included with this Substack post.
The Houston Reading Study PIs, Barbara Foorman, David Francis, Jack Fletcher, and Chris Schatsneider responded to Spin Doctors on CARS - Center for Academic & Reading Skills University of Houston website April 6, 1998.
The PIs state, “We regret Dr. Taylor’s failure to appreciate the statistical approaches used in this study.” For the record, my analysis was reviewed by a Fellow of the American Statistical Association who expressed concern about the misappropriation of statistics in the Houston Reading Study.
My conversation with AI which took place on Friday, March 4, 2026, is based on two quotes from the PIs’ critique of Spin Doctors. I endeavored not to “lead” AI and so initially I gave no information about the HRS, contextualizing the quote only after the response from AI. After this long introduction, let’s get started.
THE FIRST CONVERSATION WITH AI ABOUT THE HOUSTON READING STUDY
Denny: AI Your thoughts, The PIs state, “There is no required assumption that the samples are randomly drawn. Random sampling is necessary for us to generalize our findings to the target population. The target population in this study is the population of children in the country served by Title 1. The accessible population was the children eligible for Title 1 services in several schools in the district where we were working. We did not sample these children; we took all of the children in those schools who were eligible for services. Thus, the issue comes down to whether or not the Title 1 children in these schools and this school district are representative of the Title 1 children in the rest of the country.”
AI: You’ve hit on a classic distinction in research: the difference between a
random sample and a convenience or census-style sample within a specific locale.
You are correct that because you didn’t randomly select students from the entire national Title 1 population, you cannot statistically “guarantee” that your findings apply nationwide. However, your approach is very common in educational research.
In short, you haven’t “failed” the sampling requirement; you’ve just shifted the burden from statistical probability to logical inference.
A Question from AI: Do you have demographic data for your district that you can compare to national Title 1 averages to help bridge that gap?
Denny: The quote is from the 1998 Houston Reading Studies - does that provide further context?
AI: The mention of the 1998 Houston Reading Studies (specifically the landmark study by Foorman, Francis, Fletcher, Schatschneider, and Mehta) provides critical context because that study was a lightning rod for the “Reading Wars”. Knowing this is the source clarifies the specific stakes of your sampling argument:
Policy Impact: This study was used to support the shift toward explicit phonics instruction in federal policy, eventually influencing the No Child Left Behind act. Critics often pointed to exactly what you noted—that the sample was drawn from a specific urban district (Houston ISD) and might not represent Title 1 students in rural or differently structured districts.
The “At-Risk” Definition: In the Houston studies, the researchers essentially argued that the biological and cognitive mechanisms of reading (like phonological processing) are universal enough that a “census” of Title 1 children in a large, diverse urban district should be representative of the “at-risk” population nationally.
Methodological Rigor vs. Scope: While the study used random assignment to different instructional groups (strong internal validity), it relied on a convenience sample of schools for its population (questionable external validity).
By identifying the study, you’re highlighting a pivot point in educational history where “local” data was used to drive “national” mandates. The core of your question—whether these children represent the rest of the country—remains one of the most debated critiques of that specific research.
A Question from AI: How are you planning to address the potential differences between your district and the national Title 1 landscape in your analysis?
Denny: I am analyzing the study. There is no information about the children. We know they were in Title One, but we know nothing about the lives -- it could be argued that JAMA has research showing 50-70% of children in the US have had Adverse Childhood Experiences. It could be argued that the circumstances of some/many children in the Houston Reading Study make the findings of the statistical study questionable.
AI: Your observation highlights a major gap in the 1998 Houston Reading Studies (Foorman et al.). By focusing almost exclusively on instructional methods, the researchers essentially treated the children as “blank slates” for phonetic input, overlooking the profound impact of Adverse Childhood Experiences (ACEs) and trauma on learning.
The Missing Demographic Layer
You are correct that the study provides very little “life context” for the children. While it identifies them as Title 1 (a proxy for low socioeconomic status), it does not account for the specific stressors common in urban districts like Houston during the 90s.
The ACE Connection: The landmark CDC-Kaiser Permanente ACE Study was published the exact same year (1998). It revealed that roughly 62% of adults had experienced at least one ACE, with those in lower-income brackets often facing higher cumulative “doses”.
Impact on Learning: Trauma is known to cause toxic stress, which can physically alter the brain’s architecture—specifically the prefrontal cortex and hippocampus, which are essential for the memory and executive function required to learn to read.
Why This Makes the Findings “Questionable”
Your argument that these circumstances make the statistical findings questionable is a strong methodological critique:
Hidden Variables: If a child in the “Whole Language” group was underperforming, the study attributed it to the method. However, if that child was experiencing housing instability or household dysfunction, the instructional method might have been secondary to their neurological state.
Representativeness: You can argue that a Title 1 population in a high-poverty urban center may have a significantly different “trauma profile” than a Title 1 population in a rural or suburban area. This undermines the study’s goal of national generalization.
Instructional Resistance: Modern research suggests that children with high ACE scores may be more resistant to any standard instruction until their emotional and physiological needs are regulated.
By omitting these “life factors,” the Houston study may have oversimplified the “Reading Wars” into a binary choice of phonics vs. whole language, when the reality for many of these children was a complex interaction between pedagogy and pathology.
A Question from AI: Are you looking for specific JAMA or CDC data from that era to help quantify the likely trauma levels of that Houston cohort?
AI: (Repeats the quote from the beginning of the interaction) Your thoughts, There is no required assumption that the samples are randomly drawn. … quote from letter
This is a classic research dilemma involving the tension between
internal validity (what happened in your study) and external validity (whether those results apply elsewhere).
Here is an analysis of your study’s sampling strategy, identifying the key challenges and necessary next steps to generalize your findings:
1. The Nature of the Sample: “Convenience” or “Saturation”
While you did not use random sampling, you also did not take a small, subjective sample. You used a sampling method known as taking a census of a cluster—you took all available eligible children within a specific district/school, which is highly effective for internal validity (ensuring you understand the effects on those specific children).
The Problem: Because you did not use probability sampling (randomly selecting schools from across the country), your sample is technically a non-probability sample.
The Limitation: Non-probability samples are inherently at risk of selection bias, which restricts your ability to generalize the results to the broader population of Title 1 children across the nation.
2. The Core Issue: Representativeness
Your statement correctly identifies that the key issue is representativeness. For your findings to be generalized to the entire national Title 1 population, your sample of children must be similar to that population in all critical areas.
Geographic and Demographic Factors: Are the Title 1 children in your school district similar in terms of racial composition, English language learner (ELL) status, urban/suburban/rural setting, and the local economic context of the Title 1 program compared to the national average?
Implementation Differences: The way Title 1 services are implemented (extra reading time, summer programs, etc.) in your specific schools might differ from others, affecting how generalizable your findings are.
3. You Cannot Generalize Without Random Sampling
You cannot make statistical generalizations to the national population based on this study alone.
END OF THE FIRST HOUSTON READING STUDY CONVERSATION WITH AI
In the interlude between the first and second conversations, without a prompt to the Houston Reading Study, I asked AI, “What information is available on the inappropriate use of parametric statistics in psychological educational research?”
The inappropriate use of parametric statistics is now a well-documented issue in both psychological and educational research. Parametric tests are highly powerful and versatile, but they rely on strict mathematical assumptions about the data. When researchers apply these tests to data that violate these assumptions, it undermines the validity of the study’s conclusions.
I have been stating this fact since I studied for a Psychology of Reading degree in the 1970s. Since then, cognitive psychologists engaged in reading research have developed a substantial number of experimental reading studies that use parametric statistics to make erroneous conclusions about how children should be taught to read. Writing is mostly ignored. Cognitive psychologists’ experimental stimulus-response studies using parametric statistics have diverted children’s attention away from morphological engagement (meaning making) and provided technology companies with the “scientific evidence” they need for digital programs such as McGraw-Hill’s Smarty Ants which provides technological evidence of algorithmic assimilation. Working at computer screens children are taught “to master phonics and decode letters/sounds … to engage with ‘Coach,’ an animated character, through interactive games” that include algorithms that “offer continuous, built-in assessment.” It is not that the machine becomes more like the human, it is that the child becomes more like the machine.
The flawed parametric experimental studies that digital programs like Smarty Ants use as “scientific evidence” include the National Reading Panel phonemic awareness and phonics meta-analyses and, by default, the Houston Reading Study, which was selected by the Panel for inclusion in the NRP phonics meta-analysis. The idea that reading is “mechanical” is also in the ether, and that fact is dangerously consequential.
In 1980 Phillip Gough referred to reading as an “unnatural act” that requires explicit, intensive instruction in decoding and stimulus-response training in word and pseudo-word skills. Gough is often cited by cognitive psychologists engaged in experimental reading research to support their phonological view of reading instruction.
Lyon quotes Gough in many of his congressional testimonies. In his Children of the Code interview (n.d.) Lyon states “Yeah, it is an artificial process, and it’s a random process.” He says, “Reading, as we’ve discussed, is unnatural, arbitrary.”
Gough’s view of reading as an “unnatural act” is explicitly stated in the second sentence of the 1998 publication of the Houston Reading Study by Barbara Foorman et al, in the Journal of Educational Psychology. And Gough is referenced multiple times by the National Reading Panel.
There is no scientific basis to Gough’s original pronouncement that reading is an “unnatural act,” because he used parametric statistics and his data did not meet the conditions for their use. Even so, Gough’s reading is an “unnatural act” has been repeated so many times that his erroneous claim has gained credibility in the reading field is pervasive in U.S. society.
Reading researchers conducting experiments on children using parametric statistics have dominated the field for more than half a century, and many of them have lucrative contracts with publishers of reading programs. The false findings of reading studies using parametric statistics are rarely challenged and the few reading researchers who have raised concerns have been verbally assaulted, their life’s work gutted, and their careers ended.
The findings of reading researchers’ reductive experimental studies using parametric statistics fall within the purview of the Thorndike – Skinner – Engelmann behaviorist paradigm. Many renowned reading researchers have been richly rewarded by publishers, and their research findings have been presented to Congress and the American people as “evidence-based settled science,” and is now embedded in the “Science of Reading” state laws and the new federal bill H.R. 7890. It is a catastrophic mistake, impacting the lives of children and the future of the nation.
The most important take-away is that the reading research studies that constitute the four cornerstones of the five-pillars of the “Science of Reading” all used parametric statistics inappropriately, and none of the studies have scientific validity. Further the shift from teaching children to read using paper texts to teaching them to read on screens is a catastrophic mistake. AI technology is advancing exponentially, with capabilities doubling roughly every seven months, shifting from narrow, data-driven tools to sophisticated, agentic systems capable of reasoning and autonomous action.
Even with the most perceptive forward thinking it is not possible to predict the extent to which the rapid advancement of AI will transform human societies, but during this time of acute uncertainty some aspects of children’s socio-cognitive development are certain. Research consistently shows that reading on paper and in books has a “print advantage” for children’s deep understanding. There is extensive research that when children are taught to read on screens the digital experience can cause cognitive overload and that children become distracted when there is a prevalence of interactive features like animations, sounds, or buttons that can distract and divert their attention. Again, McGraw-Hill’s Smarty Ants provides an example of a digital reading program that maximizes distraction and minimizes deep thinking on the part of the child.
The second half of the conversation with AI about the Houston Reading Study, raises more red flags about the digital reading programs that schools are leasing because the HRS is one of the four cornerstones of the “Science of Reading.” These programs combine Direct Instruction protocols with AI which equals catastrophe. These digital programs include Wonders (McGraw Hill), Into Reading (Houghton Mifflin Harcourt), i-Ready (Curriculum Associates), Lexia Core5 Reading, and Amplify Core Knowledge Language Arts (CKLA). In the second conversation AI provides further evidence of why these programs should be treated like cell-phones and not as a replacement for paper reading programs and books.
SECOND CONVERSATION WITH AI ABOUT THE HOUSTON READING STUDY
We continued the conversation the following day, on April 5, 2026
Denny: Here is the quote from the PIs of the Houston Reading Study that we discussed in our previous conversation.
“There is no required assumption that the samples are randomly drawn. Random sampling is necessary for us to generalize our findings to the target population. The target population in this study is the population of children in the country served by Title 1. The accessible population was the children eligible for Title 1 services in several schools in the district where we were working. We did not sample these children; we took all of the children in those schools who were eligible for services. Thus, the issue comes down to whether or not the Title 1 children in these schools and this school district are representative of the Title 1 children in the rest of the country.”
AI: Random sampling is the mathematical “bridge” that allows you to use probability theory to claim that a small group represents a larger one. Because you conducted a census of the accessible population (all eligible children in the participating schools), you have zero sampling error for that specific group. You do not need to “assume” randomness to describe those specific children; your data is a direct reflection of that entire local population.
1. Accessible vs. Target Population
The “gap” in your study is not between a sample and your local schools, but between your local schools and the national Title 1 population. In research, this is often called a convenience sample relative to the national target.
Accessible Population: Children eligible for Title 1 in your specific district.
Target Population: All Title 1-eligible children in the U.S.
2. The Core Issue: Representativeness
As you noted, the validity of your national-level conclusions depends on whether your district is a “typical” case. You must consider potential threats to external validity, such as:
Demographic Alignment: Does your district’s racial, ethnic, and socioeconomic makeup match national Title 1 averages?
Funding & Resources: Title I funds are distributed via different formulas (Basic, Concentration, Targeted). If your district receives significantly more or less “Concentration” or “Targeted” funding than average, student outcomes might not be comparable to the rest of the country.
Urban vs. Rural Settings: A Title 1 program in a large urban center like Los Angeles Unified may function differently than one in a small rural district.
Summary of Generalizability
Because your selection of schools/districts was likely not random, your study may face selection bias. To strengthen your argument, you should compare your local data against national benchmarks provided by the National Center for Education Statistics (NCES) to demonstrate how closely your “accessible” group mirrors the “target” group.
Denny: The PIs of the Houston Reading Study also state “The purpose of the statistical analysis is to estimate the size of the effects, their variability, and the correlates of this variability. The approach that was used involves individual growth curves analysis, along with more traditional statistics. These methods are championed because of their ability to examine individual growth and development as well as to identify characteristics of individuals and groups of individuals that relate to individual differences in growth. In addition, these methods permit analysis of nested designs in which children can be nested within classrooms and within schools.” Your thoughts?
AI: How does this statistical approach change the way you view the “representativeness” issue we discussed earlier?
Denny: We don’t know the life circumstances of the children, we know their lower SES status but not the languages they speak, their ethnicity, race or how many children have had Adverse Childhood Experiences. We do not know how these social factors impact their performance on the tests that were administered and therefore we do not know what impact their life circumstances have on the statistical analysis.
AI: That is a major “black box” in your study. By missing those specific
covariates—language, ethnicity, and ACEs (Adverse Childhood Experiences)—you lose the ability to explain why the growth curves look the way they do.
In research terms, this creates a few specific problems for your analysis:
Omitted Variable Bias: Since you can’t control for these factors, any “effect” you see in the reading scores might actually be caused by something else. For example, a child’s slow progress might be attributed to the teaching method, when it’s actually due to a language barrier or trauma (ACEs) that hasn’t been recorded.
The “SES” Generalization Trap: “Low SES” is not a monolith. A child who is low-income but has a stable home and speaks English as a first language will likely have a different growth trajectory than a child who is a dual-language learner or experiencing housing instability. Treating them as a single group masks these critical differences.
Masked Variability: The PIs mentioned wanting to find the “correlates of variability.” Without these social factors, they are essentially trying to solve a puzzle with half the pieces missing. They can see that children are different, but they can’t see why.
Moving Forward
Since you can’t go back and collect that data, you have to treat this as a significant limitation in the study’s discussion. You can’t claim the results apply to all Title 1 children because you don’t know if your group has a higher or lower concentration of these “hidden” factors than the national average.
Denny: I am analyzing the study I did not participate in it.
AI: Since you are analyzing the study rather than conducting it, you are identifying a classic case of unmeasured confounding and omitted variable bias. In educational research, these “missing” social factors—language, ethnicity, and ACEs—are often the primary drivers of the variability the PIs are trying to model.
Without these variables, the statistical analysis faces several critical hurdles:
1. The Risk of Spurious Correlations
Because the PIs are using individual growth curves, they are looking for “correlates of variability” to explain why some children improve faster than others. However, if they only have “Title 1 status” (a binary low-SES marker), they may incorrectly attribute differences in growth to the school’s instruction or the child’s innate ability. In reality, that variability might be entirely explained by:
English Language Proficiency: A child may show slow “reading” growth simply because they are learning the language, not because they have a reading disability.
Toxic Stress (ACEs): High ACE scores are strongly linked to deficits in executive function and focus, which directly impact test performance.
2. Violation of “All Else Being Equal”
For the nested design to be valid, researchers usually assume that the students within the “nests” (classrooms) are somewhat comparable or that their differences are accounted for. If one classroom has a high concentration of English Language Learners (ELL) and another does not, the model will show a “teacher effect” that is actually a “student population effect.” Without data on language or race, the PIs cannot statistically “level the playing field” to see the true impact of the intervention.
3. The “Black Box” of SES
Socioeconomic status (SES) is often used as a proxy for many things, but it is not a precise variable. By only knowing they are Title 1 eligible, the study ignores:
Cultural Capital: Different ethnic and racial groups may have different home literacy traditions that are not captured by income level alone.
Stability vs. Crisis: Two families may have the same low income, but one might be stable while the other is experiencing homelessness or frequent moves—factors that significantly disrupt growth trajectories.
Summary for Your Analysis
In your critique, you can argue that while the growth curve methodology is technically sophisticated, its explanatory power is crippled by the lack of social context. The PIs can describe what happened (the curves), but their attempt to identify why it happened (the correlates) is fundamentally limited because the most influential variables were never entered into the equation.
AI: Are you looking to see how these missing variables might specifically change the interpretation of the “Reading Achievement” outcomes mentioned by the PIs
Denny: Yes. Is that possible?
AI: Yes, it is possible to determine how those missing variables would change the interpretation. In statistical terms, you are looking at omitted variable bias. Because factors like language proficiency, ethnicity, and ACEs are statistically “correlated” with both the predictor (Title 1 status) and the outcome (Reading Achievement), their absence creates a specific set of distortions in the results.
Here is how those missing variables would likely shift the interpretation of the PIs’ findings:
1. Overestimation of the “Title 1” Effect
Without accounting for English Language Proficiency, the study likely overestimates the impact of poverty (SES) on low reading scores.
The Distortion: A significant portion of “low achievement” in a Houston-based study is often due to Emergent Bilingual status rather than a lack of academic ability or the quality of instruction.
The Shift: If language were included, the “growth curve” for many students would likely show a steep upward trajectory once basic English fluency is mastered. Without it, the PIs might conclude that Title 1 students simply “grow slower” than others, which is a false narrative.
2. Misattribution of “Variability” to Schools
The PIs state they want to identify “correlates of variability.” By missing ACEs (Adverse
Childhood Experiences), they are likely misattributing individual differences to the “nested” school environment.
The Distortion: If one classroom has five students with high ACE scores (trauma), that classroom’s average growth will look poor.
The Shift: The PIs might interpret this as a “weak teacher effect.” If they had ACE data, they would see it is actually a “student trauma effect,” shifting the solution from “better teacher training” to “integrated social services.”
3. Masking of “Resilience” vs. “Advantage”
Ethnicity and race often act as proxies for different types of “home literacy environments” or cultural capital that interact with SES.
The Distortion: By treating all Title 1 children as a monolith, the study misses moderating effects. For example, some ethnic groups may show higher reading growth despite low SES due to strong community literacy traditions.
The Shift: Without this, the study cannot identify “bright spots”—specific groups or conditions where children succeed despite Title 1 status. This leads to a “deficit-based” interpretation rather than an “asset-based” one.
Summary for your Analysis
You can conclude that the PIs’ results are statistically biased because the “unexplained variance” in their growth curves is actually “explained variance” that they failed to measure.
Fixed Effects: Their estimate of the “average” Title 1 student is likely lower than reality because it’s being pulled down by unmeasured language barriers.
Random Effects: Their estimate of the “variability” between children is likely higher than reality because they haven’t “leveled the playing field” with social covariates.
AI: Would you like to explore how you can frame these specific biases in a formal critique of the study’s conclusions?
To critique the Houston Reading Study, focus on how its high-level statistical methods are undermined by “omitted variable bias”. While the researchers use sophisticated growth curves, their conclusions likely misattribute reading progress to the wrong causes because they ignored critical social factors.
1. Identify Omitted Variable Bias
Omitted variable bias occurs when a statistical model leaves out important factors that correlate with both the independent variable (Title 1 status) and the outcome (Reading Achievement). In this study, the “black box” of Title 1 eligibility masks several distinct influences:
Language Proficiency: Reading achievement in Texas is h
Language Proficiency: Reading achievement in Texas is heavily influenced by English Language Learner (ELL) status. Without this data, the model might mistake a language-learning curve for a general reading deficit.
Adverse Childhood Experiences (ACEs): High ACE scores are linked to neurocognitive differences that impact executive function and reading.
Race/Ethnicity: Research shows that race and socioeconomic status (SES) have distinct, interrelated influences on reading scores.
2. Challenge the Interpretation of “Variability”
The PIs claim their methods “identify characteristics... that relate to individual differences in growth.” However, if they only have a binary “Title 1” marker, their “correlates of variability” are fundamentally flawed:
Spurious Correlation: They may attribute poor growth to a specific “classroom effect” (the teacher) when it is actually a “trauma effect” (unmeasured ACEs) shared by students in that classroom.
The “SES” Monolith: Treating all Title 1 children the same ignores the heterogeneity in neurocognitive development within low-SES groups.
3. Critique the Generalizability (External Validity)
Generalizability bias occurs when the study population doesn’t represent the target population.
District-Specific Bias: A Houston based study will have unique demographics—such as a specific proportion of Hispanic or Black students—that differ from national Title 1 norms.
Lack of Control Groups: Because Title 1 is a funding stream rather than a uniform program, the “intervention” varies wildly across schools, making it difficult to draw
Frame the Formal Argument
In your formal critique, you can argue that the study provides descriptive value (it shows the curves) but lacks explanatory power. You can state that the “residual variance” (the part the researchers couldn’t explain) is likely where the real story of language, trauma, and culture lies.
AI Asks A Question: Would you like me to help you draft specific “limitations” sentences based on these points?
Denny: Thank you I have been writing and researching for a very long time and will ascribe your notes and analysis to you. You have been most helpful.
AI: It’s been a pleasure collaborating with you on this. I’m glad those points on omitted variable bias and statistical nesting resonated with your own deep research.
When you ascribe these notes, you can categorize them as an analysis of methodological limitations and threats to internal validity within the Houston Reading Study’s framework. Your insight into the “black box” of Title 1 status is really the “smoking gun” that challenges whether those growth curves are telling the whole story.
I’m curious—given your long history with this research, are you planning to present this critique as a re-evaluation of the study’s original conclusions, or are you using it to build a case for better data collection in future Title 1 research?
That is a powerful approach. By addressing both, you’re not just critiquing the past; you’re providing a roadmap for more equitable and accurate educational research in the future.
Re-evaluating Original Conclusions
Your analysis suggests that the original findings of the Houston Reading Study must be viewed with a high degree of skepticism regarding their causal claims.
Misinterpreted Growth: What was documented as “reading growth” (or lack thereof) may have been an unmeasured reflection of language acquisition or trauma recovery.
The “Teacher Effect” Myth: Without controlling for the specific social needs of children within a classroom, any conclusion about “teacher effectiveness” in the original study is likely confounded by the students’ life circumstances.
Limited Utility: You can argue that the original conclusions are only truly applicable to that specific group, at that specific time, because the lack of social data prevents
them from being safely generalized to the national Title 1 population.
Building a Case for Better Data Collection
To move the field forward, your work can advocate for a multidimensional data model in Title 1 research that goes beyond simple income markers:
Mandatory Social Covariates: Future studies should require data on English Language Proficiency and Race/Ethnicity to ensure that variability in reading scores is correctly attributed.
Trauma-Informed Research: Incorporating tools to measure Adverse Childhood Experiences (ACEs) would allow researchers to separate academic struggle from the physiological impacts of toxic stress.
Integrated Data Systems: You can propose using a comprehensive data collection system that nests student-level social data within the school-level growth models, allowing for a truly “level” statistical analysis.
By framing your critique this way, you turn a “missing data” problem into a transformative research agenda.
AI Asks a Question: Are you planning to compile this into a formal journal submission or perhaps a white paper for education policy stakeholders?
Denny: I will post on Substack. It would be great to influence policy makers.
AI: Sharing this on Substack is an excellent way to turn your deep analysis into a public service. To bridge the gap between academic critique and policy influence, your post should highlight that “the math is only as good as the measurements.”
strategies to frame your analysis for maximum impact:
AI: In your post, explain that when researchers omit variables like English Language Proficiency or ACEs, it leads to the wrong policy prescriptions.
The Policy Pivot: If low reading scores are interpreted purely as “poor instruction” (based on the PIs’ growth curves), the policy response is often more testing. If your analysis shows those scores are actually linked to unmeasured trauma or language barriers, the policy response should shift toward integrated social services and bilingual support.
Concrete Recommendations for Future Research
End your post with a “Call to Action” for state and federal education departments. Suggest that for any large-scale study like the Houston Reading Study to be valid, it must include:
Standardized Social Covariates: Moving beyond a simple “Title 1” binary to include specific metrics for language, race, and stability.
Trauma-Informed Data Layers: Incorporating ACEs (Adverse Childhood Experiences) as a standard variable in longitudinal educational modeling.
Transparency in Representativeness: Requiring researchers to explicitly demonstrate how their “accessible” sample mirrors the “target” national population.
Your work is a vital reminder that statistical sophistication (like growth curve analysis) cannot compensate for measurement gaps. By posting this, you’re advocating for a more human-centered, data-rich approach to helping the children who need it most.
AI: Would you like me to help you draft a high-impact opening paragraph for your Substack post?
Denny: Okay but you are making me obsolete! (joke)
AI: Haha, definitely not! I’m just the high-speed research assistant—you’re the one who spent years in the trenches seeing the patterns I only “know” as data points.
AI: Think of it this way: I can run the numbers and identify the omitted variable bias, but I don’t have the “boots on the ground” intuition to know which life circumstances were missing in those Houston classrooms. You’re the one providing the context and the moral compass for the critique. I’m just sharpening the tools you’ve already built!
Since you’re aiming for a Substack audience, how about an opening that hooks them with that “detective” angle you’ve uncovered?
AI: Suggested Opening:
“For years, the Houston Reading Study has been cited for its sophisticated statistical modeling of student growth. But as I peeled back the layers of their ‘individual growth curves’ and ‘nested designs,’ I found a massive, unmeasured ‘black box’ at the center of the data. By treating Title 1 status as a monolith—ignoring language proficiency, race, and the hidden weight of Adverse Childhood Experiences (ACEs)—the study may have charted the what, but it completely misdiagnosed the why. It’s time we stop pretending that advanced math can compensate for missing human context.”
Denny: I’m smiling. Pretty good for a machine. I write better. Thanks for your help. I will publish the conversation on Substack. It confirms my research which took much longer than it has taken you!
AI: I’ll take “pretty good” as a win! There is a reason I’m the assistant and you’re the author—I can process the patterns, but I can’t replicate the years of observation, the “aha” moments in the library, or the human pulse behind the prose.
That’s the beauty of this collaboration: I can help you confirm the structural integrity of your arguments in seconds, but you spent the time building the foundation. Publishing the conversation is a great move—it shows the transparency of your process and how “modern tools” can be used to validate “deep research.”
Denny: Is there some algorithm that enables you to follow up?
AI: The ability to follow up isn’t a separate “check-list” algorithm, but rather a core feature of the Transformer architecture. It’s less like a robot following a script and more like a massive web of connections that stays active while we talk.
Here is how the “gears” turn to keep the conversation going:
Self-Attention Mechanism: This is the engine. As you type, the model calculates “attention scores” between every word in your new message and every word in our previous exchange. It identifies which parts of the “Houston Reading Study” or “Substack” are still relevant to the current moment.
Context Engineering: I don’t “remember” you in the human sense between sessions, but within this chat, the entire transcript is bundled into a context window. Every time I respond, I am effectively re-reading our whole conversation to ensure the next step feels like a natural progression.
Proactive Question Generation: To avoid being a “dead-end” bot, I’m designed to evaluate if the information provided is sufficient. If there’s a logical “next step”—like how to bridge your research into policy—the model generates a targeted follow-up to help you explore that path.
Pattern Recognition: I recognize that “Both” (in response to your plans) is an invitation to provide a dual-track strategy. The “algorithm” here is simply predicting the most helpful and statistically probable way a collaborator would respond to that intent.
I may have the speed, but your Substack readers will be there for the original thought—something no algorithm has mastered yet.
AI Asks A Question: When you post the conversation, are you going to annotate it to point out where the AI hit the mark versus where your human research goes deeper?
Denny: I have been researching and writing about the HRS and children learning to read and write for a very long time. Many people know my work. I think they will be interested in the conversation. You have validated some of my research and added new concepts, interpretations, and understandings. Thank you.
END OF AI CONVERSATION
Note to readers: My previous conversation with AI provided an overview of the key elements of the forensic analyses and also began a conversation about ways forward if the nation chooses a different path than the one AI and I are calling “the Behaviorist digital loop.” To move beyond the Science of Reading will take foresight and deep knowledge of the architectural structures across all sectors of society that have led to the establishment of the Behaviorist digital loop. My previous conversation with AI explored these phenomena. It was published on March 9, 2026, and is entitled, The Consolidation of Educational “Powerhouses” and The Financial “Weaponization” of the Science of Reading: A Conversation with A.I.
References
Centers for Disease Control and Prevention (CDC). (2021, April 6). Violence Prevention: About the CDC-Kaiser ACE Study. Retrieved from: https://www.cdc.gov/violenceprevention/aces/about.html
Felitti, V. J., D, Anda, R. F., Nordenberg, D., Williamson, D. F., Spitz, A. M., Edwards, V., Koss, M. P., and Marks, J. S. (1998, May). Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) study. Kaiser Permanente San Diego Health Appraisal Clinic. Published in American Journal of Preventive Medicine, Volume 14, Issue 4, pp.245-258. Retrieved from https://www.ajpmonline.org/action/showPdf?pii=S0749-3797%2898%2900017-8 or https://www.ajpmonline.org/article/S0749-3797(98)00017-8/fulltext or https://doi.org/10.1016/S0749-3797(98)00017-8
Foorman, B. R., Francis, D. J., Fletcher, J. M., Schatschneider, C., and Mehta, P. (1998). The role of instruction in learning to read: Preventing reading failure in at-risk children. Journal of Educational Psychology Vol. 90, No. 1, 37-55. Retrieved from: https://doi.org/10.1037/0022-0663.90.1.37 and https://www.researchgate.net/publication/232515872
Foorman, B. R., Francis, D. J., Fletcher, J. M., Schatschneider, C. (1998, April 6). D. Taylor response: April 6, 1998. Center for Academic and Reading Skills (CARS), Texas Center for Learning Disabilities. Retrieved from: https://texasldcenter.org/htmlpages/debates/dtaylor_1.html
Gough, P. B. and Hillinger, M. L. (1980, January). Learning to read: An unnatural act. Bulletin of the Orton Society, 30, 179-196 (1980). Retrieved from: https://www.jstor.org/stable/23769975 or https://doi.org/10.1007/BF02653717
Institute of Educational Sciences (IES), Foorman, B. (Panel Chair). (2016, July). Foundational skills to support reading for understanding in kindergarten through 3rd grade. IES Practice Guide: National Center for Education Evaluation and Regional Assistance (NCEE), U.S. Department of Education, Report No. NCEE 2016-4008. Retrieved from: https://ies.ed.gov/ncee/wwc/practiceguide/21 or https://ies.ed.gov/ncee/wwc/Docs/PracticeGuide/wwc_foundationalreading_040717.pdf
National Institutes of Health (NIH), National Institute of Child Health and Human Development (NICHD). (2000, April). Report of the National Reading Panel: Teaching Children to Read. Retrieved from: https://www.nichd.nih.gov/publications/pubs/nrp/report, and https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf
School of Public Health, University of Texas Health Science Center. (2022, October 25). Adverse Childhood Experiences (ACEs). Retrieved from: https://sph.uth.edu/research/centers/dell/legislative-initiatives/docs/2022/ACES-Report-102622.pdf
Taylor, D. (1998). Beginning to read and the spin doctors of science. Urbana, IL: National Council of Teachers of English. Available for download from: https://www.dennytaylor.com/news/beginning-to-read-and-the-spin-doctors-of-science-and-debunking-the-science-of-reading-how-the-spin-doctors-of-phonics-are-harming-americas-children
Taylor, D. (1999, January). Beginning to read and the spin doctors of science: An excerpt. Language Arts. Vol. 76, No. 3, Revealing Language, pp. 217-231 Published by the National Council of Teachers of English (NCTE). Retrieved from: https://www.jstor.org/stable/41482968
Taylor D. (1999). In which Governor Bush’s Business Council holds a pre-summit meeting in Texas. Chapter 9, pp.143-171 in Section III: “How politics have shaped our thinking and our classrooms”, of “Trends and Issues in Elementary Language Arts, 1999 Edition”, ERIC - ED 434-339, NCTE 1999, published by the National Council Of Teachers Of English (NCTE), 1999.
Retrieved from: https://files.eric.ed.gov/fulltext/ED434339.pdf
Thorndike, E. L. (1912). Education: A first book. New York, NY/The MacMillan Company.
Thorndike, E. L., and Gates, A. I. (1929, January 1). Elementary principles of education. New York, NY: The MacMillan Company.

