11

I am looking for an existing database of English words with each word separated by syllables. My purpose is to further edit each word in any selected article based on the separation of syllables.

Does anyone know an existing product or method that can help me achieve this process?

Thanks!

Costique
  • 23,712
  • 4
  • 76
  • 79

3 Answers3

9

This site has a file of 44K words with syllables, not just hyphenation.

http://www.delphiforfun.org/programs/Syllables.htm

strattonn
  • 1,790
  • 2
  • 22
  • 41
6

I'm not sure if this is what you're looking for, but CMU has a pronunciation dictionary that clearly shows each syllable:

http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Ross
  • 9,652
  • 8
  • 35
  • 35
  • 2
    +1. Technically, they're not syllables, they're phonemes. But, it's possible to calculate syllables from the phonemes along with lexical stress. – Dave Ray Jan 20 '09 at 01:58
  • 2
    "calculate syllables from the phonemes along with lexical stress" How would one go about doing that? – Jacob Singh May 22 '17 at 13:06
  • Looks like the syllables are labeled 0 1 2, with the primary emphasis being 1. I think you could match up e.g. 0 1 0 with the 3-syllables from a dictionary that breaks up the syllable letters to make e.g. "baNAna". Does that sound about right? – sea-rob Mar 26 '21 at 07:47
2

Perhaps a standard word list, plus a good hyphenation algorithm would do the trick?

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • Hyphenation has nothing to do with the number of syllables - it's simply an arbitrary decision about where to split a word in the middle. In the example you've given, the output of supercali... etc. is ['su', 'per', 'cal', 'ifrag', 'ilis', 'tic', 'ex', 'pi', 'ali', 'do', 'cious'], which contains three chunks with more than one syllable, and "project" is hyphenated simply as "project". – Lou Sep 25 '20 at 08:43
  • It's misleading to say it "has nothing to do with .. syllables". Hyphenation starts as separation between syllables, but has a few non-obvious tweaks. Some hyphenation algorithms may miss syllable breaks, and sometimes the written form doesn't map well to the spoken form. But basically, it's about syllables. And it's especially true that it might be close enough for the original purpose asked about. – Ned Batchelder Sep 25 '20 at 13:58
  • Hyphens *can* be used to demarcate syllables (as can any ASCII character) but they're usually not. Dictionaries tend to use the interpunct for this purpose, as the hyphen would cause confusion in already hyphenated words. In hyphenation algorithms such as you've shown, the purpose is to mark points in the word where it would be acceptable to split the word onto a new line when it won't fit on a computer screen. And while you're right that hyphenation does usually occur on the syllable boundary, it doesn't mark *every* syllable boundary, which makes it unfit for counting syllables. – Lou Sep 25 '20 at 14:07
  • E.g. it's not particularly instructive to say that "project" has one syllable ... give or take one or two due to variation in the hyphenation algorithm. – Lou Sep 25 '20 at 14:08
  • @Lou, are you saying that usually hyphenation is not at syllable boundaries? Perhaps instead of "has nothing to do with syllables" you meant, "need not be related to syllables"? – Ned Batchelder Sep 26 '20 at 18:49
  • Sure, I'll rephrase: hyphenation is *related* to syllables, in the sense that words are usually hyphenated at syllable boundaries. It is not the same thing as syllabification, or the process of splitting words into syllables, as it is not used for this purpose. There is crossover between these two concepts - my point is that hyphenation doesn't effectively solve OP's problem. – Lou Sep 29 '20 at 08:51