How Prepamigo Scores Your Speaking
Trillion-Parameter LLM + Audio Analysis: A Dual-Engine Approach to Professional Speaking Assessment
Quick Navigation
One of the biggest challenges in CELPIP speaking preparation is the lack of professional, timely feedback. Traditional practice often leaves you guessing about your true performance level. Prepamigo's speaking scoring system was built to solve this problem—we use a trillion-parameter large language model combined with a specialized audio analysis model to deliver examiner-quality scoring and feedback.
Why We Chose This Architecture
Our Philosophy: Accurate Scores, Not Inflated Ones
Many AI scoring tools use lightweight models that generate results in seconds—but often produce inflated scores that don't reflect your true ability. When you practice with unrealistic feedback, you develop a false sense of readiness. We chose a different path: accurate assessment that genuinely helps you identify weaknesses and improve before the real exam.
Trillion-Parameter LLM
We use a trillion-parameter large language model for content evaluation—analyzing your logic, grammar, vocabulary, and task completion with deep semantic understanding.
Why It Matters:
- • Deep semantic understanding catches subtle errors smaller models miss
- • Nuanced evaluation of complex sentence structures and ideas
- • Realistic scoring aligned with actual CELPIP examiner standards
Audio Analysis Model
A dedicated speech evaluation engine analyzes the acoustic properties of your recording—pronunciation accuracy, fluency, rhythm, and speaking pace.
Why It Matters:
- • Text analysis alone cannot assess how you actually sound
- • Detects pronunciation issues that text-only scoring misses entirely
- • Essential for accurate Listenability scoring
💡 Why Both Engines Are Necessary
A single model cannot accurately evaluate both "what you said" and "how you said it." The LLM excels at semantic understanding and language analysis; the audio model specializes in acoustic features. Only by combining both can we provide a truly comprehensive and accurate assessment—the kind of holistic evaluation a real CELPIP examiner would give.
Four Scoring Dimensions
Our scoring dimensions are strictly based on the official CELPIP Score Comparison Chart. This ensures your practice scores accurately reflect how real CELPIP examiners evaluate responses, giving you meaningful feedback you can trust.
Each response is evaluated across four core dimensions, scored from 0-12:
Content & Coherence
Evaluates whether your response addresses the prompt, presents complete information, and develops ideas logically. High scores require clear viewpoints, well-organized structure, and coherent flow.
Vocabulary
Assesses the range, accuracy, and appropriateness of your word choices. Can you use advanced vocabulary and idiomatic expressions? Are your word choices precise and natural?
Listenability
A comprehensive evaluation of grammar accuracy, pronunciation, intonation, and fluency. This is where our audio analysis model plays a critical role—assessing how clear and natural you sound.
Task Fulfillment
Checks whether you've completely addressed the task requirements and used an appropriate tone and style. For example, giving advice to a friend should sound different from making a formal complaint.
The Scoring Process
Speech-to-Text Transcription
First, we use advanced speech recognition to accurately convert your recording into text, ensuring every word is captured precisely.
Parallel Dual-Engine Evaluation
The trillion-parameter LLM analyzes the transcript for content quality, grammar, and vocabulary. The audio analysis model simultaneously processes the original audio for pronunciation, fluency, and pace.
Intelligent Score Fusion
Results from both engines are scientifically combined. The audio analysis directly influences the "Listenability" dimension score, then all four dimensions are weighted to calculate the final score.
Score & Detailed Feedback
You receive a standardized 0-12 score along with comprehensive feedback: grammar corrections, vocabulary suggestions, pronunciation tips, and personalized improvement recommendations.
Why Your Score Might Be Capped
Ever wonder why your score doesn't improve even when you feel your English is good? Through our research into actual CELPIP scoring patterns, we discovered that certain factors act as hard limits on your score—no matter how eloquent your vocabulary or how complex your grammar, these issues will cap your results:
Off-Topic
If your response is completely off-topic or irrelevant, the maximum score is 5
Incomplete Task
If the AI determines you haven't fulfilled all task requirements (e.g., the prompt asked you to mention specific points but you didn't), the maximum score is 7
Too Short
If your response is less than 2/3 of the required time, the maximum score is 6
More Than Just a Score
We believe a score is just the starting point—detailed feedback is what truly helps you improve:
✏️ Grammar Analysis
- • Line-by-line grammar error identification
- • Corrected expressions provided
- • Explanations of why each error occurred
- • Summary of strengths and areas for improvement
📖 Vocabulary Analysis
- • Vocabulary level assessment (Basic/Intermediate/Advanced)
- • Highlights of effective word choices
- • Suggestions for more sophisticated alternatives
- • Notes on collocations and idiomatic usage
🎯 Optimization Tips
- • Focus areas based on your weakest dimensions
- • Actionable improvement steps
- • Personalized practice recommendations
✨ AI Model Response
- • An improved version that preserves your original intent
- • Shows what a high-scoring response looks like
- • Learn by comparison for faster progress
Why We Don't Use Lightweight Models
We tested various model configurations extensively. Here's what we found:
Lightweight Models (e.g., GPT-3.5 class)
Fast response times, but consistently produce inflated scores:
- • Often miss subtle grammar errors and awkward phrasing
- • Tend to give generic, surface-level feedback
- • Scores typically 1-2 levels higher than actual ability
- • Students feel confident but are unprepared for the real exam
Text-Only Evaluation (No Audio Analysis)
Even with a powerful LLM, text-only scoring has blind spots:
- • Cannot detect pronunciation problems
- • Misses fluency issues like hesitation and stuttering
- • Ignores speaking pace (too fast or too slow)
- • Listenability dimension is essentially guessed
Our Approach: Trillion-Parameter LLM + Audio Analysis
The combination delivers examiner-level accuracy:
- • Catches nuanced errors that smaller models overlook
- • Audio analysis provides real pronunciation assessment
- • Scores align closely with actual CELPIP results
- • Students know their true level and can target specific weaknesses
The bottom line: A score that makes you feel good but doesn't reflect reality is worse than useless—it's harmful to your preparation. We'd rather give you an honest 7 that helps you improve than a flattering 10 that sets you up for disappointment on test day.
Summary
Prepamigo's speaking scoring system uses a "Trillion-Parameter LLM + Audio Analysis Model" dual-engine architecture. The LLM evaluates content quality based on official CELPIP standards across four dimensions; the audio model assesses your actual pronunciation and fluency.
Unlike lightweight models that produce inflated scores, our system is designed to give you an honest assessment of your true ability. The scores from both engines are scientifically fused, with penalty rules applied for off-topic or incomplete responses, resulting in a standardized 0-12 score plus comprehensive personalized feedback to help you improve with every practice session.
PrepAmigo