EnglishCentral’s Intellispeech℠ Assessment System


EnglishCentral’s Intellispeech℠ system assesses learners’ speaking ability as a combination of 3 elements:

  • Pronunciation Score
  • Fluency Score
  • Completion

Pronunciation Score previously measured learners’ speech across  40 phonemes. In its latest version, Intellispeech℠  measures performance across all 64,000 possible triphones (combinations of phonemes).  This change dramatically improves the accuracy of the system.  

Fluency Score is still based on duration and pause rate.  

Completion is still based on whether user speaks all words, or drops words.


Real Time Feedback in the Player


As the learner speaks, Intellispeech℠  provides feedback according to the follow types of errors:



Line Points

Learners gain points for each line spoken. Only the final version of each line spoken counts towards the video.  So, if a line is repeated several times, only the last version counts towards the video line score.

On each line,  we show the number of points earned out of the maximum number of possible possible per line.


Video Grades

The video grade is the cumulative measure of  the Intellispeech℠ system across all lines spoken in the dialog.

Intellispeech℠ computes a percentile relative to other learners in the target language group to produce a grade. For instance, if we determine that the speech for the video was 75% better than other Japanese users who spoke the video, the learner would get a “B+” as a video grade.


Percentiles are mapped to grades for each native language of the learner. Here is the grade table used for Japanese learners:


Phoneme Tiles

Our IntelliSpeech℠ is always listening to learners speech and is continually assessing the learner’s performance across  40 phonemes.  We have patent-pending (Patent App# 13/338,383) on the system’s ability to determine the learners strongest and weakest pronunciations among the 40 phonemes, thereby guiding the learner on which to practice.   There are 4 states of the Phoneme Tiles which correspond how closely they resemble native speech with Green being the closest to native, and red being the furthest.   Grey means the system does not have enough data to make a determination.

These phones are tracked in each learner’s customized Pronunciation Center.


Because measuring phoneme performance can be noisy for individual utterances, in general a learner must speak 10 lines before the IntelliSpeech℠ is confident in setting the color of the Phoneme Tile.


Our Speech Database


Our research is supported by what we believe is the largest corpora of non-native speech with transcription in the world.  Our speech assessment system is now based on over 250M utterances collected from over 100 countries or different L1 languages.  


Our reference models are also training on large amounts of data collected from native speakers speaking the authentic speech from our videos (as opposed to many other corpora which may contain “read speech”).


Common Pronunciation Challenges

IntelliSpeech℠ has analyzed Japanese users’ English pronunciation based on over 100 Million recorded utterances in the EnglishCentral database and identified the five most common problematic sounds of Japanese speakers.  They are:


Based on this analysis and speech data, we have designed a pronunciation course — Top 10 Challenges for Japanese Speakers —  where learners can focus on the most challenging pronunciations for Japanese speakers, using authentic videos.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: