How Prepamigo Scores Your Speaking

Trillion-Parameter LLM + Audio Analysis: A Dual-Engine Approach to Professional Speaking Assessment

An in-depth look at how we deliver accurate, examiner-level feedback for your CELPIP speaking practice

Quick Navigation

One of the biggest challenges in CELPIP speaking preparation is the lack of professional, timely feedback. Traditional practice often leaves you guessing about your true performance level. Prepamigo's speaking scoring system was built to solve this problem—we use a trillion-parameter large language model combined with a specialized audio analysis model to deliver examiner-quality scoring and feedback.

Why We Chose This Architecture

Our Philosophy: Accurate Scores, Not Inflated Ones

Many AI scoring tools use lightweight models that generate results in seconds—but often produce inflated scores that don't reflect your true ability. When you practice with unrealistic feedback, you develop a false sense of readiness. We chose a different path: accurate assessment that genuinely helps you identify weaknesses and improve before the real exam.

🧠

Trillion-Parameter LLM

We use a trillion-parameter large language model for content evaluation—analyzing your logic, grammar, vocabulary, and task completion with deep semantic understanding.

Why It Matters:

  • Deep semantic understanding catches subtle errors smaller models miss
  • Nuanced evaluation of complex sentence structures and ideas
  • Realistic scoring aligned with actual CELPIP examiner standards
🎙️

Audio Analysis Model

A dedicated speech evaluation engine analyzes the acoustic properties of your recording—pronunciation accuracy, fluency, rhythm, and speaking pace.

Why It Matters:

  • • Text analysis alone cannot assess how you actually sound
  • • Detects pronunciation issues that text-only scoring misses entirely
  • Essential for accurate Listenability scoring

💡 Why Both Engines Are Necessary

A single model cannot accurately evaluate both "what you said" and "how you said it." The LLM excels at semantic understanding and language analysis; the audio model specializes in acoustic features. Only by combining both can we provide a truly comprehensive and accurate assessment—the kind of holistic evaluation a real CELPIP examiner would give.

Four Scoring Dimensions

Our scoring dimensions are strictly based on the official CELPIP Score Comparison Chart. This ensures your practice scores accurately reflect how real CELPIP examiners evaluate responses, giving you meaningful feedback you can trust.

Each response is evaluated across four core dimensions, scored from 0-12:

📝

Content & Coherence

Evaluates whether your response addresses the prompt, presents complete information, and develops ideas logically. High scores require clear viewpoints, well-organized structure, and coherent flow.

📚

Vocabulary

Assesses the range, accuracy, and appropriateness of your word choices. Can you use advanced vocabulary and idiomatic expressions? Are your word choices precise and natural?

🎧

Listenability

A comprehensive evaluation of grammar accuracy, pronunciation, intonation, and fluency. This is where our audio analysis model plays a critical role—assessing how clear and natural you sound.

🎙️ Enhanced by Audio Analysis
🎯

Task Fulfillment

Checks whether you've completely addressed the task requirements and used an appropriate tone and style. For example, giving advice to a friend should sound different from making a formal complaint.

The Scoring Process

1

Speech-to-Text Transcription

First, we use advanced speech recognition to accurately convert your recording into text, ensuring every word is captured precisely.

2

Parallel Dual-Engine Evaluation

The trillion-parameter LLM analyzes the transcript for content quality, grammar, and vocabulary. The audio analysis model simultaneously processes the original audio for pronunciation, fluency, and pace.

3

Intelligent Score Fusion

Results from both engines are scientifically combined. The audio analysis directly influences the "Listenability" dimension score, then all four dimensions are weighted to calculate the final score.

4

Score & Detailed Feedback

You receive a standardized 0-12 score along with comprehensive feedback: grammar corrections, vocabulary suggestions, pronunciation tips, and personalized improvement recommendations.

Why Your Score Might Be Capped

Ever wonder why your score doesn't improve even when you feel your English is good? Through our research into actual CELPIP scoring patterns, we discovered that certain factors act as hard limits on your score—no matter how eloquent your vocabulary or how complex your grammar, these issues will cap your results:

⚠️

Off-Topic

If your response is completely off-topic or irrelevant, the maximum score is 5

⏸️

Incomplete Task

If the AI determines you haven't fulfilled all task requirements (e.g., the prompt asked you to mention specific points but you didn't), the maximum score is 7

⏱️

Too Short

If your response is less than 2/3 of the required time, the maximum score is 6

More Than Just a Score

We believe a score is just the starting point—detailed feedback is what truly helps you improve:

✏️ Grammar Analysis

  • Line-by-line grammar error identification
  • Corrected expressions provided
  • Explanations of why each error occurred
  • Summary of strengths and areas for improvement

📖 Vocabulary Analysis

  • Vocabulary level assessment (Basic/Intermediate/Advanced)
  • Highlights of effective word choices
  • Suggestions for more sophisticated alternatives
  • Notes on collocations and idiomatic usage

🎯 Optimization Tips

  • Focus areas based on your weakest dimensions
  • Actionable improvement steps
  • Personalized practice recommendations

AI Model Response

  • An improved version that preserves your original intent
  • Shows what a high-scoring response looks like
  • Learn by comparison for faster progress

Why We Don't Use Lightweight Models

We tested various model configurations extensively. Here's what we found:

Lightweight Models (e.g., GPT-3.5 class)

Fast response times, but consistently produce inflated scores:

  • Often miss subtle grammar errors and awkward phrasing
  • Tend to give generic, surface-level feedback
  • Scores typically 1-2 levels higher than actual ability
  • • Students feel confident but are unprepared for the real exam
📝

Text-Only Evaluation (No Audio Analysis)

Even with a powerful LLM, text-only scoring has blind spots:

  • Cannot detect pronunciation problems
  • Misses fluency issues like hesitation and stuttering
  • Ignores speaking pace (too fast or too slow)
  • Listenability dimension is essentially guessed

Our Approach: Trillion-Parameter LLM + Audio Analysis

The combination delivers examiner-level accuracy:

  • Catches nuanced errors that smaller models overlook
  • Audio analysis provides real pronunciation assessment
  • Scores align closely with actual CELPIP results
  • • Students know their true level and can target specific weaknesses

The bottom line: A score that makes you feel good but doesn't reflect reality is worse than useless—it's harmful to your preparation. We'd rather give you an honest 7 that helps you improve than a flattering 10 that sets you up for disappointment on test day.

Summary

Prepamigo's speaking scoring system uses a "Trillion-Parameter LLM + Audio Analysis Model" dual-engine architecture. The LLM evaluates content quality based on official CELPIP standards across four dimensions; the audio model assesses your actual pronunciation and fluency.

Unlike lightweight models that produce inflated scores, our system is designed to give you an honest assessment of your true ability. The scores from both engines are scientifically fused, with penalty rules applied for off-topic or incomplete responses, resulting in a standardized 0-12 score plus comprehensive personalized feedback to help you improve with every practice session.

Ready to experience professional-level speaking assessment?

Start Speaking Practice
CELPIP Practice Tests & AI Scoring | PrepAmigo