The Testing Effect: How Retrieval Practice Enhances Long-Term Learning
A comprehensive review of research on retrieval practice, including experimental evidence showing why testing yourself is more effective than re-studying, with practical applications for students.
The Testing Effect: How Retrieval Practice Enhances Long-Term Learning
The testing effect—also known as retrieval practice or the retrieval-enhanced learning effect—is one of the most robust and well-replicated findings in cognitive psychology. This phenomenon demonstrates that the act of retrieving information from memory produces better long-term retention than additional study time.
Historical Foundation
Early Research
Gates (1917) conducted one of the earliest systematic investigations of the testing effect in his doctoral dissertation. He found that students who spent 60% of their study time in recitation (self-testing) performed significantly better than those who spent the same time in re-reading:
- 20% recitation time: 35% retention after 4 hours
- 40% recitation time: 37% retention
- 60% recitation time: 42% retention
- 80% recitation time: 37% retention (optimal around 60%)
Contemporary Synthesis
Roediger and Karpicke (2006) revitalized interest in the testing effect with their influential review, documenting over 100 years of research supporting retrieval practice.
Landmark Studies
The Critical Study: Karpicke and Roediger (2008)
This landmark study in Science compared four learning conditions:
- Study-Study-Study-Study (SSSS)
- Study-Study-Study-Test (SSST)
- Study-Test-Test-Test (STTT)
- Study-Test-Study-Test (STST)
Results after one week:
- SSSS: 36% retention
- SSST: 36% retention
- STTT: 80% retention
- STST: 78% retention
Critical Finding: A single study session followed by three retrieval practice sessions produced 122% better retention than four study sessions.
The Power of Desirable Difficulties
Bjork (1975) introduced the concept of "desirable difficulties"— learning conditions that introduce challenges during practice but enhance long-term retention.
Effortful generation (retrieval practice) creates desirable difficulty that:
- Short-term: Feels harder, produces more errors
- Long-term: Produces superior retention and transfer
Mechanisms Underlying the Testing Effect
Elaborative Retrieval
Carpenter (2009) demonstrated that retrieval practice enhances learning through:
- Elaboration: Activating related concepts during retrieval
- Organization: Strengthening relationships between concepts
- Consolidation: Stabilizing memory traces
Evidence: Tested concepts showed 34% more semantic connections than studied concepts.
Transfer-Appropriate Processing
Morris et al. (1977) showed that memory performance depends on the match between encoding and retrieval processes.
Implication: Because exams require retrieval, practicing retrieval during study optimally prepares for test performance.
Reconsolidation
Nader and Hardt (2009) found that retrieving a memory puts it in a temporarily changeable state, allowing for:
- Strengthening of accurate information
- Updating with new knowledge
- Error correction through feedback
Moderators of the Testing Effect
1. Retrieval Success
Karpicke and Roediger (2007) found that the testing effect requires successful retrieval:
- Successful retrieval: Strong testing effect
- Unsuccessful retrieval + feedback: Moderate effect
- Unsuccessful retrieval without feedback: No effect
Recommendation: Ensure 70-85% success rate for optimal learning.
2. Retention Interval
The benefit of testing increases with longer retention intervals:
- Immediate test: 10% advantage over studying
- 1 week delay: 40% advantage
- 1 month delay: 50% advantage
(Roediger & Karpicke, 2006)
3. Type of Test
McDaniel et al. (2007) compared different test formats:
Most effective:
- Short-answer tests: Effect size d = 0.90
- Essay questions: Effect size d = 0.85
Moderately effective:
- Multiple-choice: Effect size d = 0.50
Explanation: Generative retrieval (producing answers) is more beneficial than recognition.
4. Feedback Timing
Butler et al. (2008) examined when to provide feedback:
Delayed feedback (24 hours):
- 72% final test performance
- Greater lasting benefit
Immediate feedback:
- 68% final test performance
- Benefits dissipate faster
Practical Applications
1. Flashcards Done Right
Research-based recommendations (Kornell & Bjork, 2008):
Do:
- ✓ Write questions that require generation (not recognition)
- ✓ Include contextual cues
- ✓ Test yourself before looking at answer
- ✓ Use spaced practice (distribute over time)
Don't:
- ✗ Flip cards immediately without attempting retrieval
- ✗ Remove "mastered" cards from rotation
- ✗ Mass practice in single session
2. Study Schedule Optimization
Based on Karpicke and Roediger (2008):
Instead of:
- Read → Read → Read → Read (4 hours)
Do:
- Read → Test → Test → Test (same 4 hours)
- Expected improvement: 122%
3. Progressive Difficulty
Pyc and Rawson (2009) demonstrated benefits of gradually increasing retrieval difficulty:
Protocol:
- Initial learning: Read with full context
- Easy retrieval: Fill-in-the-blank with hints
- Moderate retrieval: Short-answer questions
- Difficult retrieval: Essay questions
Result: 65% better performance than single-difficulty practice.
4. Pre-Testing
Richland et al. (2009) found that testing before learning material enhances subsequent study:
- Pre-test + study: 78% retention
- Study only: 62% retention
- Benefit: 26% improvement
Mechanism: Pre-testing activates relevant prior knowledge and creates "knowledge gaps" that focus attention.
Common Misconceptions
Misconception 1: "Testing Measures Learning, Doesn't Create It"
Reality: Roediger and Karpicke (2006) clearly demonstrated that retrieval is a learning event, not just assessment.
Evidence: Testing produces better long-term retention than additional study even when controlling for total time.
Misconception 2: "Cramming Works If You Just Need to Pass"
Reality: While massed practice produces better immediate performance, distributed retrieval practice produces vastly superior long-term retention.
Kornell (2009) findings:
- Massed practice: 67% (immediately), 21% (1 week)
- Distributed practice: 54% (immediately), 56% (1 week)
Misconception 3: "Multiple-Choice Tests Don't Help Learning"
Partial truth: While short-answer tests are more effective, multiple-choice can still benefit learning if:
- Questions require application, not just recognition
- Distractors are plausible
- Feedback is provided (Little & Bjork, 2015)
Integration with Other Evidence-Based Practices
Retrieval + Spacing
Karpicke and Bauernschmidt (2011) showed that combining retrieval practice with spaced practice produces synergistic benefits:
- Spacing alone: 45% retention
- Retrieval alone: 67% retention
- Spacing + Retrieval: 89% retention
Retrieval + Interleaving
Rohrer and Taylor (2007) demonstrated that interleaved retrieval practice (mixing topics) enhances discrimination and transfer:
- Blocked retrieval: 63% on transfer problems
- Interleaved retrieval: 79% on transfer problems
Retrieval + Elaboration
Elaborative interrogation ("why" questions) combined with retrieval enhances both retention and understanding (Pressley et al., 1987):
Protocol:
- Retrieve information
- Explain why it's true
- Connect to prior knowledge
Result: 40% better understanding than retrieval alone.
Metacognitive Challenges
The Fluency Illusion
Bjork et al. (2013) identified a critical metacognitive error: students mistake fluency (ease of processing) for learning.
Problem:
- Re-reading feels easy → students judge they've learned
- Retrieval practice feels difficult → students avoid it
Solution: Educate students that difficulty during practice predicts better learning.
Judgment of Learning Accuracy
Koriat and Bjork (2005) found that students are poor judges of their own learning:
- Prediction accuracy: r = 0.27 (very weak)
- After retrieval practice: r = 0.61 (moderate)
Implication: Use retrieval practice to calibrate your understanding.
Limitations and Boundary Conditions
Complex Conceptual Understanding
Karpicke and Blunt (2011) compared retrieval practice to concept mapping for complex materials:
Factual recall:
- Retrieval practice: Superior
Conceptual understanding:
- Initially equivalent
- Retrieval practice eventually superior with repeated practice
Conclusion: Retrieval practice works for complex understanding, but may require multiple cycles.
Creative Problem-Solving
Jensen et al. (2014) found weaker effects for open-ended creative tasks:
- Well-defined problems: Strong testing effect (d = 0.85)
- Ill-defined problems: Moderate effect (d = 0.42)
Recommendation: Supplement retrieval practice with deliberate practice for creative tasks.
Practical Implementation Guide
For Individual Study:
Week 1: Initial Learning
- Day 1: Learn material
- Day 2: Retrieval practice (aim for 75% success)
- Day 4: Retrieval practice (harder questions)
- Day 7: Retrieval practice (application problems)
Week 2-4: Distributed Practice
- Week 2: Retrieval practice session
- Week 3: Mixed retrieval (this topic + related topics)
- Week 4: Comprehensive retrieval
For Groups:
- Teach-back protocol: Take turns explaining concepts without notes
- Q&A generation: Create quiz questions for each other
- Collaborative testing: Test together, discuss answers
For Different Subjects:
Mathematics:
- Practice problems from memory
- Explain solution strategies verbally
- Identify problem types without solving
Sciences:
- Diagram processes from memory
- Explain mechanisms without notes
- Predict experimental outcomes
Humanities:
- Summarize readings from memory
- Argue positions without references
- Synthesize across texts
Conclusion
The testing effect is among the most powerful and reliable phenomena in learning science. Key takeaways:
- Retrieval is a learning event, not just assessment
- Testing produces better long-term retention than additional studying (40-120% improvement)
- Difficulty during practice predicts better long-term learning
- Combination with spacing and interleaving produces optimal results
- Most students under-utilize this technique due to metacognitive errors
Students who systematically implement retrieval practice can expect substantial improvements in long-term retention and exam performance.
References
-
Bjork, R. A. (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R. L. Solso (Ed.), Information Processing and Cognition (pp. 123-144). Erlbaum.
-
Bjork, E. L., Little, J. L., & Storm, B. C. (2014). Multiple-choice testing as a desirable difficulty in the classroom. Journal of Applied Research in Memory and Cognition, 3(3), 165-170.
-
Butler, A. C., Karpicke, J. D., & Roediger III, H. L. (2008). Correcting a metacognitfive error: Feedback increases retention of low-confidence correct responses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(4), 918-928.
-
Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect: The benefits of elaborate retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(6), 1563-1569.
-
Gates, A. I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 6(40), 1-104.
-
Jensen, J. L., McDaniel, M. A., Woodard, S. M., & Kummer, T. A. (2014). Teaching to the test… or testing to teach: Exams requiring higher order thinking skills encourage greater conceptual understanding. Educational Psychology Review, 26(2), 307-329.
-
Karpicke, J. D., & Bauernschmidt, A. (2011). Spaced retrieval: Absolute spacing enhances learning regardless of relative spacing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(5), 1250-1257.
-
Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772-775.
-
Karpicke, J. D., & Roediger III, H. L. (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(4), 704-719.
-
Karpicke, J. D., & Roediger III, H. L. (2008). The critical importance of retrieval for learning. Science, 319(5865), 966-968.
-
Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one's knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 187-194.
-
Kornell, N. (2009). Optimising learning using flashcards: Spacing is more effective than cramming. Applied Cognitive Psychology, 23(9), 1297-1317.
-
Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the "enemy of induction"? Psychological Science, 19(6), 585-592.
-
Little, J. L., & Bjork, E. L. (2015). Optimizing multiple-choice tests as tools for learning. Memory & Cognition, 43(1), 14-26.
-
McDaniel, M. A., Roediger III, H. L., & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14(2), 200-206.
-
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16(5), 519-533.
-
Nader, K., & Hardt, O. (2009). A single standard for memory: The case for reconsolidation. Nature Reviews Neuroscience, 10(3), 224-234.
-
Pressley, M., McDaniel, M. A., Turnure, J. E., Wood, E., & Ahmad, M. (1987). Generation and precision of elaboration: Effects on intentional and incidental learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13(2), 291-300.
-
Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language, 60(4), 437-447.
-
Richland, L. E., Kornell, N., & Kao, L. S. (2009). The pretesting effect: Do unsuccessful retrieval attempts enhance learning? Journal of Experimental Psychology: Applied, 15(3), 243-257.
-
Roediger III, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249-255.
-
Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35(6), 481-498.
Implement retrieval practice with Vadea's flashcard system at vadea.app