Evaluating the Evaluator

A Metacognitive Critique of EdReports' K–2 Literacy Tool

When Thermometers Control the Weather

THE CONSTANT:
CONTENT
FIRST

Yeoman Era: Jefferson, Mann → Test civic knowledge & morality

Industrial Era: Dewey, Harris → Test academic content & vocational preperation

Corporate Era: Tyler, Standards → Test measurable objectives & efficiency

EdReports (Today): Common Core → Test "science of reading" compliance

The Pattern: Society changes → New theorists emerge → Assessment follows → Curriculum narrows to match assessment

Change comes from society, not individuals. Theorists are products of their time.

The Paradox

Assessments are supposed to measure learning.

Instead, they shape learning.

We've built a system where the tool meant to observe has become the thing that controls what happens.

🌡️ → 🌦️

When the measure becomes the target, it ceases to be a good measure.

The Result: Teachers teach to the test. Curriculum narrows to what's assessed. Students learn that success = performing on evaluations, not understanding.

Why analyze an evaluation tool instead of a traditional curriculum unit?

Because EdReports is curriculum

It doesn't just review literacy programs—it determines which ones get adopted by thousands of schools
Districts use EdReports ratings as the gatekeeper for what millions of K–2 students will read and how they'll be taught
Evaluation tools encode assumptions about what counts as "quality curriculum"—making those assumptions worth examining
By analyzing EdReports, we can see how 250 years of content-first assumptions are still active inside a modern, "science-of-reading" tool

This analysis evaluates the evaluator: What is EdReports actually measuring? Does it assess curriculum quality or enforce compliance?

EdReports frames itself as modern and "research-based":

"Gateway 1: Alignment to Research-Based Practices and Standards for Foundational Skills Instruction... Do materials emphasize explicit, systematic instruction of research-based and/or evidence-based phonemic awareness? Do materials emphasize explicit, systematic instruction of research-based and/or evidence-based phonics?"

— EdReports K–2 ELA Review Criteria v2.1, p. 2

But beneath this modern language lies compound gatekeeping:

"Materials must 'Meet Expectations' in BOTH Gateway 1 and Gateway 2 to be reviewed in Gateway 3... Materials being reviewed must score above zero points in each indicator. Otherwise, the materials automatically do not proceed to Gateway 3."

— EdReports K–2 ELA Review Criteria v2.1, p. 10

EdReports doesn't measure curriculum quality—it enforces content-first compliance and calls it "science."

What We Think Evaluation Measures

✓ Deep understanding
✓ Critical thinking
✓ Student growth

What EdReports Actually Measures

✗ Content compliance
✗ Phonics-first gatekeeping
✗ Corporate Era logic

The Core Issue: Traditional assessment assumes cognition is static—"the same results each time in the same setting" (Sullivan, 2011). But learning is context-sensitive, identity-shaped, and constantly evolving.

EdReports doesn't just review curriculum—it acts as a gatekeeper that enforces a specific philosophy of literacy and learning.

If EdReports measures content compliance instead of curriculum quality,
how do we prove it?

A Two-Part Analysis

Part 1: Test EdReports as an evaluation tool (Wiggins' validity framework)
Part 2: Treat EdReports AS curriculum and evaluate what opportunities it creates

Let's examine the methodology for both parts...

I hypothesize that EdReports will fail both as a curriculum evaluation tool and as a curriculum itself.

Part 1: As Evaluation Tool

EdReports will fail as an evaluation tool

Testing validity, reliability, and actual usage patterns

Part 2: As Curriculum

EdReports will narrow curriculum opportunities

Analyzing what learning opportunities it creates or restricts

If correct, both tests should reveal the same pattern: EdReports enforces content-first compliance and calls it "quality."

Drawing on Wiggins (1998), we examine three dimensions to test EdReports as an evaluation tool:

Validity

Does EdReports measure what it claims to measure?

Can curricula succeed/fail on EdReports for reasons unrelated to actual curriculum quality?

Reliability

Does EdReports assume cognition is static?

Does it account for context-sensitive, identity-shaped learning (Sullivan, 2011)?

Usage

How is EdReports actually used?

Does it function as a gatekeeper that narrows curriculum options and enforces compliance?

Wiggins' Validity Test: Can curricula succeed/fail for reasons unrelated to actual curriculum quality?

What EdReports Claims to Measure

"Research-based" curriculum quality
Foundational skills instruction
Standards alignment

What EdReports Actually Measures

Adherence to one specific phonics-first approach
Presence of scripted teacher guidance
Compliance with narrow literacy definition

The Validity Problem

Yes, curricula can fail for the wrong reasons: A high-quality curriculum designed for Deaf readers would score zero on EdReports—not because it's ineffective, but because it doesn't use phonics-based decoding. EdReports conflates one specific approach with curriculum quality itself.

Sullivan (2011): Traditional assessment assumes cognition is static—"the same results each time in the same setting." But learning is context-sensitive, identity-shaped, and constantly evolving.

What EdReports Assumes:

"Materials... provide reasonable pacing where phonics skills are taught one at a time... [with a] clear evidence-based explanation for the expected hierarchy of phonemic awareness competence."

— EdReports K–2 ELA Review Criteria v2.1, p. 7

Static Cognition Model

• One universal sequence for all learners
• Linear skill progression
• Context-independent learning
• Same pathway = same results

Reality: Dynamic Cognition

• Multiple valid pathways to literacy
• Identity-shaped learning
• Context-sensitive development
• Different pathways = equally valid outcomes

Result: EdReports is unreliable because it treats learning as a fixed, universal process rather than a context-dependent, identity-shaped experience.

Wiggins' Usage Test: How do schools and districts actually use this evaluation tool in practice?

EdReports Functions As a Gatekeeper:

Compound Gatekeeping Structure

"Materials must 'Meet Expectations' in BOTH Gateway 1 and Gateway 2 to be reviewed in Gateway 3" — Any curriculum that doesn't pass both phonics gates never gets evaluated for usability or quality

Binary Pass/Fail Creates Monoculture

Districts use EdReports ratings as adoption criteria, narrowing curriculum options to only those that align with one specific approach

No Recognition of Context

The same evaluation criteria apply regardless of student population, community values, or local literacy needs

The Usage Problem

EdReports is used as a compliance enforcement mechanism, not a quality measurement tool. It doesn't help districts choose the best curriculum for their students—it narrows options to those that comply with one ideological position on literacy instruction.

When EdReports fails as an evaluation tool, what happens to literacy education?

The Consequences

Districts adopt curricula based on flawed measurements
Teachers constrain instruction to match compliance criteria
Students experience narrowed literacy opportunities
The cycle reinforces content-first assumptions across generations

The tool meant to improve curriculum quality instead restricts what counts as "quality."

EdReports doesn't just evaluate curriculum—it functions as curriculum by determining what teachers teach and what students learn.

How do we test this claim?

If EdReports is curriculum, we need to evaluate it as curriculum—not just as a tool.

This requires a framework that measures learning opportunities, not content compliance.

Enter the 3D Compass: A framework for measuring curriculum across 8 dimensions of learning opportunities.

The 3D compass has 3 axes and 8 octants. All sides of all axes are equally important. Quality curriculum provides opportunities in all 8 octants—not just one narrow corner.

Independence ↔ Collaboration

Does it reward student agency or require teacher scripts? Opportunities for self-driven work vs. co-construction

Practical ↔ Theoretical

Does it provide both established foundations (what society agrees is "true") AND opportunities to explore alternatives? Balance between practical learning and theoretical exploration (fringe theories, lost voices, niche interests)

Structured ↔ Flexible

Does it provide both clear structure (rubrics, assigned goals) AND opportunities for self-direction (creating own goals, exploring without rubrics)? Balance between prescribed pathways and flexible exploration

I scored all 54 EdReports indicators across 6 dimensions to measure what opportunities EdReports creates or restricts

The 6 Dimensions (0-5 scale):

Independence: Student agency & self-generated questions

Collaboration: Co-construction of knowledge with peers

Practical: Established, evidence-based foundations

Theoretical: Exploring alternative perspectives

Structured: Systematic sequences & explicit pathways

Flexible: Multiple pathways & student direction

The Scoring Process:

Read each indicator from Gateways 1, 2, and 3
Score 0-5 on each dimension based on what opportunities it creates
Calculate averages across all 54 indicators
Map results on 6D radar chart

Example: "Materials provide systematic and explicit instruction..." scores high on Structured (prescribed pathway), zero on Independence (no student agency), and zero on Flexible (one-size-fits-all)

This reveals whether EdReports creates balanced opportunities—or enforces a single, narrow vision of learning

Mapping all 54 EdReports indicators across the 6 dimensions reveals extreme imbalance:

What This Shows

Maxed out: Structured (4.31/5.0)
Moderate: Practical (2.44/5.0)
Near-zero: Flexible (0.25), Collaboration (0.17)
Essentially zero: Independence (0.05), Theoretical (0.05)
The shape is barely visible—collapsed to one corner

What Balanced Would Look Like

A quality curriculum would show moderate scores (2-3) across all six dimensions—creating a roughly hexagonal shape. Instead, EdReports is maxed out in structure while near-zero in five other dimensions.

Independence

0.05

Collaboration

0.17

What EdReports Actually Says:

"Materials include systematic and explicit instruction... with repeated teacher modeling... Students practice phonics skills..."

— EdReports K–2 ELA Review Criteria v2.1, Gateway 1

"Materials provide clear protocols and teacher guidance that frequently allow students to engage in listening and speaking..."

— EdReports K–2 ELA Review Criteria v2.1, Gateway 2

Why Independence = 0.05

No criteria for student-generated questions
No rewards for self-driven inquiry
Teacher modeling dominates—student agency absent
Students "practice" and "respond," not explore

Why Collaboration = 0.17

No criteria for co-construction of knowledge
"Protocols" enforce teacher-led discussion
Peer dialogue not valued—individual correctness is
Minimal opportunities for collaborative meaning-making

Result: EdReports rewards curricula that minimize both student agency and collaborative learning—students follow scripts, not create knowledge

Practical

2.44

Theoretical

0.05

What EdReports Actually Says:

"Scope and sequence clearly delineate... with a clear evidence-based explanation for the expected hierarchy of phonemic awareness..."

— EdReports K–2 ELA Review Criteria v2.1, p. 7

"Materials include a clear, research-based core instructional pathway..."

— EdReports K–2 ELA Review Criteria v2.1, Gateway 2

Why Practical = 2.44

Heavy emphasis on "evidence-based" practices
Established sequences presented as universal
"Research-based" = what authorities have determined
Settled knowledge treated as truth

Why Theoretical = 0.05

Zero criteria for exploring alternative theories
No space to question foundational assumptions
Multiple perspectives on literacy: not recognized
One pathway presented as "science"

Result: EdReports treats one approach to literacy as settled truth—no exploration of alternatives, no questioning of assumptions

Structured

4.31

Flexible

0.25

What EdReports Actually Says:

"Materials provide reasonable pacing where phonics skills are taught one at a time and allot time where phonics skills are practiced to automaticity..."

— EdReports K–2 ELA Review Criteria v2.1, p. 7

"Materials include decodable texts with phonics aligned to the program's scope and sequence..."

— EdReports K–2 ELA Review Criteria v2.1, p. 7

Why Structured = 4.31

Explicit sequencing required for everything
Skills "taught one at a time"—lockstep pacing
"Systematic" = uniformity across all learners
One prescribed pathway to literacy

Why Flexible = 0.25

Deaf readers: Achieve literacy without phonics—EdReports excludes them
Multimodal pathways: Not acknowledged
Student-directed pacing: Forbidden
Alternative routes to literacy: structurally impossible

Result: EdReports enforces a single, rigid pathway—erasing diverse learners and alternative routes to literacy

EdReports doesn't create a balanced opportunity space—it collapses curriculum into a single corner

What EdReports Rewards:

Maximum structure (4.31/5.0)—prescribed pathways, lockstep pacing
Established knowledge (2.44/5.0)—"evidence-based" practices as universal truth
Teacher-led scripts—explicit instruction with repeated modeling

What EdReports Excludes:

Student agency (0.05/5.0)—no self-generated questions or inquiry
Collaborative learning (0.17/5.0)—no co-construction of knowledge
Alternative perspectives (0.05/5.0)—no exploration of different theories
Flexible pathways (0.25/5.0)—deaf readers, multimodal literacy: excluded

The Real-World Consequence:

When districts use EdReports as a gatekeeper, they adopt curricula that maximize compliance and minimize opportunities for agency, collaboration, exploration, and diverse pathways. This isn't about quality—it's about control.

The Pattern Confirmed:

EdReports enforces the same content-first, compliance-driven logic that has persisted for 250 years.
It calls this "science"—but it's actually corporate-era gatekeeping dressed in modern language.

We've shown EdReports fails as an evaluation tool.
But what should we measure instead of content compliance?

Measure Opportunities, Not Compliance

Instead of asking "Does this curriculum cover the right content?"
We should ask "What opportunities does this create for metacognitive development?"

Here's the framework that makes this possible...

These are interconnected nodes. Any node can trigger any other—no hierarchy, no sequence. Like a 6D radar graph with butterfly effects.

1. Endospection

Looking inward to map your own cognitive architecture. Not "Who am I?" but "Who do I think I am, and why?" Unlearning imposed narratives. Building internal stability.

2. Diffusion

Pure exploration without agenda. The "most freeing area of metacognition." Following tangents, embracing the butterfly effect. A small curiosity can blossom into massive, unexpected journeys.

3. Vectoring

Tailoring curiosity with magnitude and direction. Turning wandering wonder into targeted inquiry. Asking "Where do I find what I need?" Making deliberate choices about scope and sourcing.

4. Refraction

The reality check. How does your identity bend the information you receive? How does new information force "truth maintenance" updates to your internal reality? Critical awareness of bias.

5. Exospection

Mapping external minds. Understanding stakeholder biases, contexts, realities. "What are they actually asking for?" Fitting others' realities into your own to ensure communication is received.

6. Synthesis

Creating entirely new ideas. Putting it all together across independence, collaboration, and application. Not summarizing—constructing something that didn't exist before.

Why Not Bloom's Taxonomy?

Bloom treats "remembering" as bottom, "creating" as top. But Diffusion can spark Refraction, which sends you back to Vectoring, which reshapes Endospection. No hierarchy. No sequence.

Curriculum as Opportunities

Teachers design assignments that create opportunities for students to engage these 6 nodes. Not "master content," but "experience these metacognitive processes."

The Orrery in Motion

Each node influences every other—no beginning, no end. This is learning as a living ecosystem.

What if curriculum design started with opportunities, not content?

Content-First (Current Model)

Define what must be covered
Sequence content linearly
Test for content mastery
Narrow to measurable objectives
Compliance = Quality

Opportunity-First (Alternative)

Define learning experiences available
Create interconnected nodes
Assess access to diverse pathways
Expand across 8 dimensions
Opportunities = Quality

The shift: Instead of asking "What content must students master?", we ask "What opportunities will students have access to?" Curriculum becomes a map of possible experiences, not a checklist of required content.

How Do We Evaluate Curriculum Quality?

Not by measuring content coverage, but by analyzing the range of learning opportunities students can access.

Questions to Ask:

Does the curriculum offer opportunities for both independence and collaboration?
Can students engage in both practical application and theoretical exploration?
Are there pathways for both structured guidance and flexible discovery?
Do students move through interconnected nodes, not rigid sequences?

This is what the 3D Compass measures: Not "Does this curriculum teach phonics correctly?" but "What learning opportunities does this curriculum create or restrict?"

EdReports gives us content-focused evaluation.
We need metacognitive opportunity mapping.

What EdReports Does

Enforces 250 years of content-first assumptions
Measures compliance with one narrow literacy definition
Creates a curriculum monoculture
Acts as thermometer controlling the weather

What We Need Instead

Evaluation that maps metacognitive opportunities
Balance across 8 octants: independence, collaboration, practical, theoretical, structure, flexibility
Recognition of diverse literacy pathways
Curriculum as opportunities, not destinations

When we change how we evaluate curriculum,
we change what counts as learning—
and we change who gets to learn.

Recreating Evaluation

The purpose of evaluation is not to enforce one destination
or to measure compliance with content.

It is to expand the opportunities children have
to learn, think, question, and become themselves.

Metacognition is not the end of learning.
It is the only beginning we can trust.

How is treating EdReports itself as the curriculum unit to evaluate justified?

Melrose (1998)

Progressive Evaluation: Evaluation should adapt to context and theoretical commitments, not follow one rigid checklist.

This allows for designing criteria based on specific theoretical frameworks.

Norris (1998)

Ideological Text Analysis: Curriculum materials encode assumptions about whose knowledge counts.

EdReports can be treated as an ideological text that reveals assumptions about literacy and learning pathways.

Wiggins (1998)

Validity Tests: Can students succeed/fail for reasons unrelated to what's being assessed?

Wiggins' validity questions apply to EdReports: Does it measure curriculum quality, or something else entirely?

Together: These scholars justify Part 1 (testing EdReports using Wiggins' validity framework) and Part 2 (treating EdReports as curriculum and evaluating it using the 3D Compass to reveal its ideological assumptions).

Documents Reviewed

EdReports K–2 ELA Core Content Review Criteria (v2.1)
EdReports K–2 ELA Evidence Guide (v2.1)

Theoretical Framework

Wiggins (1998) - Validity tests
Sullivan (2011) - Reliability & cognition
Melrose (1998) - Progressive evaluation
Norris (1998) - Ideological text analysis

Analysis Process

For each EdReports gateway and indicator, the analysis examined:

What assumptions about literacy, learners, "normal" development, gender, and disability are encoded?
Which side of each axis does this push curriculum toward?
What opportunities does this create or foreclose?

Examples: Highlighting language that enforces strict decoding paths • Tagging indicators that narrowly define "family roles" • Noting where collaboration or interpretive freedom is explicitly encouraged vs. absent

1. Add an "Opportunity & Agency" Gateway

Criteria around student independence, collaboration, and interpretive agency—not just content coverage.

2. Revise Representation Indicators

Replace "women's roles" language with criteria about critical, non-stereotyped portrayals and identity complexity.

3. Recognize Multiple Literacy Pathways

Add criteria acknowledging Deaf readers, multilingual literacies, and multimodal meaning-making.

Evaluating the Evaluator

A Metacognitive Critique of EdReports' K–2 Literacy Tool

Historical Context: 250 Years of Assessment-Driven Curriculum

The Fundamental Flaw: Thermometers Control the Weather

The Paradox

The Research Question: Why Evaluate EdReports?

Because EdReports is curriculum

The Argument: EdReports Enforces 250 Years of Old Assumptions

EdReports frames itself as modern and "research-based":

But beneath this modern language lies compound gatekeeping:

The Problem: EdReports Measures Compliance, Not Learning

What We Think Evaluation Measures

What EdReports Actually Measures

How Do We Test This Claim?

A Two-Part Analysis

The Hypothesis

Part 1: As Evaluation Tool

Part 2: As Curriculum

Part 1: Does EdReports Succeed as an Evaluation Tool?

Validity

Reliability

Usage

Part 1 Results: Validity - Does EdReports Measure What It Claims?

What EdReports Claims to Measure

What EdReports Actually Measures

The Validity Problem

Part 1 Results: Reliability - Does EdReports Assume Static Cognition?

What EdReports Assumes:

Static Cognition Model

Reality: Dynamic Cognition

Part 1 Results: Usage - How Is EdReports Actually Used?

EdReports Functions As a Gatekeeper:

Compound Gatekeeping Structure

Binary Pass/Fail Creates Monoculture

No Recognition of Context

The Usage Problem

The Literacy Problem: When Evaluation Tools Fail

The Consequences

EdReports IS Curriculum

How do we test this claim?

The 3D Compass Framework: Balanced Opportunities Across All Axes

Independence ↔ Collaboration

Practical ↔ Theoretical

Structured ↔ Flexible

Part 2 Methodology: Scoring EdReports with a 6D Opportunity Compass

The 6 Dimensions (0-5 scale):

The Scoring Process:

Part 2 Results: EdReports Creates an Extremely Lopsided Opportunity Space

What This Shows

What Balanced Would Look Like

Deep Dive: Independence ↔ Collaboration (Both Near Zero)

What EdReports Actually Says:

Why Independence = 0.05

Why Collaboration = 0.17

Deep Dive: Practical ↔ Theoretical (Practical High, Theoretical Zero)

What EdReports Actually Says:

Why Practical = 2.44

Why Theoretical = 0.05

Deep Dive: Structured ↔ Flexible (Structured Maxed, Flexible Near Zero)

What EdReports Actually Says:

Why Structured = 4.31

Why Flexible = 0.25

Part 2 Synthesis: What These Scores Reveal About EdReports

What EdReports Rewards:

What EdReports Excludes:

The Real-World Consequence:

The Pattern Confirmed:

Now: What's the Alternative Framework?

Measure Opportunities, Not Compliance

The 6 Metacognitive Nodes: What Curriculum Should Develop

1. Endospection

2. Diffusion

3. Vectoring

4. Refraction

5. Exospection

6. Synthesis

Why Not Bloom's Taxonomy?

Curriculum as Opportunities

The Orrery in Motion

The Alternative: Opportunity-Based Curriculum