2

Possible Duplicate:
Detecting syllables in a word

For kicks (and to brush up on my Python), I'm trying to create an algorithm that will randomly generate a Haiku (Japanese poem made up of three lines with 5, 7, and 5 syllables each).

The problem I've run into is finding the number of syllables in a word (I'm using the en-US.dic from Ubuntu).

Currently, I have a script running that attempts to grab the number reported by this web site, but that is slow, and isn't generating many hits. This seems more promising, but I don't know how to use Python to inject a word into their text box.

My question is two-fold:

  • Is there an algorithmic way to determine the number of syllables in a word (and thus, not need to make thousands of web requests)?
  • Can I use Python to inject words into WordCalc?
Community
  • 1
  • 1
SomeKittens
  • 38,868
  • 19
  • 114
  • 143

2 Answers2

3

For the second part, if you use Chrome, right click on the "Calculate Word Count" button and select "Inspect Element". You'll see that it POSTs a form to /index.php with some relevant pieces:

name="text"
name="optionSyllableCount"
name="optionWordCount"

(the second two are input checkboxes, which usually need a value to POST).

import urllib

url = 'http://www.wordcalc.com/index.php'
post_data = urllib.urlencode(
    {'text': 'virgina'})
post_data = '%s&optionSyllableCount&optionWordCount' % post_data

cnxn = urllib.urlopen(url, post_data)
response = cnxn.read()
cnxn.close()

If you'd like to parse a response you get:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(response)
h3_matches = [h3 for h3 in soup.findAll('h3') if h3.text == 'Statistics']
if len(h3_matches) != 1:
  raise Exception('Wrong number of <h3>Statistics</h3>')
h3_match = h3_matches[0]
table = h3_match.findNextSibling('table')

td_matches = [td for td in table.findAll('td')
              if td.text == 'Syllable Count']
if len(td_matches) != 1:
  raise Exception('Wrong number of <td>Syllable Count</td>')
td_match = td_matches[0]

td_value = td_match.findNextSibling('td')
syllable_count = int(td_value.text)
bossylobster
  • 9,993
  • 1
  • 42
  • 61
  • Great response, and timely too. I accepted the other one because it was simpler (and didn't require the internet). However, I'll probably end up implementing this one too, so I can learn how to do it next time. – SomeKittens May 02 '12 at 14:51
3

Download the Moby Hyphenated Word List. It has most English words and names hyphenated by syllable. The number of syllables would be the number of hyphen markers + number of spaces + number of actual hyphens + 1.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119