1

I am looking to split the word into its syllables. I am trying to build a speech-to-text system but focused on transcribing medical terms.

Consider a doctor/pharmacist who instead of typing out the medicine dosage would just speak into the microphone and a digital prescription would be generated automatically. I want to avoid ML/DL based approaches since I wanted the system to work in real-time. Therefore I wanted to tackle this problem via a dictionary-based approach. I have scraped the rxlist.com to get all the possible medicine names. Currently, I am using the webspeech API (https://www.google.com/intl/en/chrome/demos/speech.html). This works well but often messes up the medicine names.

  1. Panadol twice a day for three days would become panel twice a day for three days

It works sometimes (super unstable). Also, it is important to consider that panadol is a relatively simple term. Consider Vicodin (changed to why couldn't), Abacavir Sulfate, etc.

Here is the approach I thought could perhaps work.

  1. Maintain a dictionary of all medicines.
  2. Once the detections are there (I append all the detections instead of just using the last output), compare the string distance from each medicine (could be huge, so sorting is important here) and replace the word with minimum error.
  3. If nothing matches (maintain a threshold of error in step 2), check the syllables of prediction and that of medicine name and replace the one with the lowest error.

So I now have the list, I was hoping if I could find a library/dictionary API which could give me the syllables of medicine names. Typing How to pronounce vicodin on Google gets the Learn to Pronounce panel which has: vai·kuh·dn. I would want something similar, now I could scrape it off of Google, but I don't get the results for all the medicine names.

Any help would be appreciated.

Thanks.

1 Answers1

1

You can use a library called pyphen. It's pretty easy to use. To install it run the following command in your terminal:

pip install pyphen

After this, find out the syllables in a string:

import pyphen
a = pyphen.Pyphen(lang='en')
print(a.inserted('vicodin'))

I hope you find this useful

snookso
  • 377
  • 3
  • 16