1

I need to write a code in order for the user to input a sentence and then the output is the transcription of the sentence using python I know that I need first to create a dictionary of all the Arabic phonemes, but I don't actually know the steps. The first module was to normalise the arabic speech that's to say removing any unneeded component in the text the user enters and also converting numbers to an actual text so that 1990 will be nineteen ninety. Thank you in advance.

I actually put the dic x{"ء":"ʔ","ب":"b","ت":"t","ث":"θ","ج":"g","ح":"ħ","خ":"x","د":"d","ذ":"ð","ر":"r","ز":"z","س":"s","ش":"ʃ","ص":"sˤ","ض":"dˤ","ط":"tˤ","ظ":"ðˤ","ع":"ʕ","غ":"ɣ","ف":"f","ق":"q","ك":"k","ل":"l","م":"m","ن":"n","ه":"h} by the transcription however some letters in arabic have more than one phoneme so I need a way to solve this also.

Mekky01
  • 21
  • 1

1 Answers1

1

You can't really do an actual phonemic transcription programatically — there's too much information that is not supplied by the alphabet that humans need to supply to get at the pronunciation. (Like short vowels.)

But you can do a 1:1 transcription. There's a package called Camel Tools that will convert Arabic text into Buckwalter transcription (which used to be used a lot in Arabic NLP and I don't know maybe still is). You would just need to replace some of the weird letters with their IPA equivalents.

Check out the code in this answer: Fast transliteration for Arabic Text with Python

larapsodia
  • 594
  • 4
  • 15