8

I'm trying to build a collection English words that are difficult to pronounce.

I was wondering if there is an algorithm of some kind or a theory, that can be used to show how difficult a word is to pronounce.

Does this appear to you as something that can be computed?

As this seems to be a very subjective thing, let me make it more objective, let's say hardest words to pronounce by text to speech technologies.

Yasser1984
  • 2,401
  • 4
  • 32
  • 55
  • 1
    More difficult for whom? – dweiss May 02 '12 at 21:37
  • 2
    Many words are difficult to pronounce for Lisp programmers. – Chris Taylor May 02 '12 at 21:38
  • While this would be a very interesting problem to solve, the fact that English contains a lot of sight words... there would have to be a lot of "hard coded" expections – afuzzyllama May 02 '12 at 21:38
  • 1
    Owing to the vast number of words in English, with the vast number of origins, I'd say it's close to impossible to compute this. Consider "rhythm" versus "Worcestershire", or even "Featherstonewaugh". The last one is pronounced "Fanshaw". Then there's "segue", which is pronounced "segway". Easy to pronounce, not obvious from the spelling. –  May 02 '12 at 21:40
  • Bribe your local speech therapist into giving you a list. I'm sure he or she would do better than any algorithm. – Sergey Kalinichenko May 02 '12 at 21:44
  • I made the question a little bit more objective, let's say the hardest words to pronounce by a computer – Yasser1984 May 02 '12 at 21:49
  • That would be entirely dependent on the rules built into the speech algorithm, if you mean hard to pronounce correctly. Or do you mean hard to pronounce in a tongue tied sort of way? – hatchet - done with SOverflow May 02 '12 at 21:51
  • If you find an algorithm to determine difficulty of pronunciation by a computer, that same algorithm will probably be applicable to correcting the computer's pronunciation, so I am not sure such an algorithm would be meaningful. – Brian May 02 '12 at 22:14
  • See https://stackoverflow.com/questions/11874274/pronounceability-algorithm/11878323 – user7660047 Apr 25 '20 at 12:06

4 Answers4

3

One approach would be to build a list with two versions of each word. One the correct spelling, and the other being the word spelled using the simplest of phonetic spelling. Apply a distance function on the two words (like Levenshtein distance http://en.wikipedia.org/wiki/Levenshtein_distance). The greater the distance between the two words, the harder the word would be to pronounce.

1

Great problem! Off the top of my head you could create a system which contains all the letters from the phonetic alphabet and with connected weights betweens every combination based on difficulty (highly specific so may need multiple people testing and take averages etc) then have a list of all words from the English dictionary stored on disk and call a script which cycles through each entry and performs web scraping on wikipedia for the phonetic spelling and ranks their difficulty. This could take into consideration the length of the word as well as the difficulty between joining phonetics then order the list based on the difficulty.

Thats what I would try and do :P

rflood89
  • 694
  • 2
  • 6
  • 11
0

To a certain extent...

Speech programs for example use a system of phonetics to try and pronounce words.

For example, "grasp" would be split into:

Gr-A-Sp

However, for foreign words (or words that don't follow this pattern), exception lists have to be kept e.g. Yacht

Robbie Dee
  • 1,939
  • 16
  • 43
0

Suggestion

Fortunately Pronunciation as a process is dependent on a two factors these include

  1. the phones making up the words and the location of vowels and semi vowels i.e

/a/,/ae/,/e/,/i/,/o/,/u/,/w/,/j/...

  1. length of the word.

the first relates to the mechanics of phone sound production as the velum, cheeks tongue have to be altered to produce various sounds related to individual phones i.e nasal etc. this makes some words more difficult to pronounce as the movement required may be a lot. Refer to books about phonetics to find positions of pronouncing each phone.

Algorithm

a weighted spanning tree with weight being the difficulty of pronouncing two consecutive phones i.e l and r or /sh/ and /s/

good luck.