1

Using Python, I'm trying to write a simple code were it converts Arabic text to numbers. The code I used can be found here and I'm trying to adapt it from English to Arabic. From unknown reason, it doesn't seem to work very well:

      def text2int(textnum, numwords={}):
        if not numwords:
            units = [
    "", "واحد", "اثنان", "ثلاثة", "أربعة", "خمسة", "ستة", "سبعة", "ثمانية",                          
    "تسعة",                           
    "عشرة", "أحد عشر", "اثنا عشر", "ثلاثة عشر", "أربعة عشر", "خمسة عشر",                           
    "ستة عشر", "سبعة عشر", "ثمانية عشر",                           
    "تسعة عشر"                          
                        ]

            tens = [
            "عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",                    
            "تسعون"                    
                    ]

            scales = ["مية", "الف", "مليون", "مليار", "ترليون"]

            numwords["و"] = (
                             1, 0)
            for idx, word in enumerate(units):    numwords[word] = (1, idx)
            for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
            for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

        current = result = 0
        for word in textnum.split():
            if word not in numwords:
              raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0

        return result + current

    print (text2int("خمسة و عشرون"))

The output of the method that I get is 5, which is completely wrong and it should be 25. Is there a way I could solve this? Also, the scales are not working at all.

Hamad
  • 373
  • 5
  • 14
  • 2
    Probably if you (for test) replace Arabic words with some English and provide some sample inputs and outputs it would be easier to solve your problem – Hyyudu Feb 12 '20 at 07:48
  • I don't know Arabic but you can replace your second for line in this line and get 25: `for idx, word in enumerate(tens): numwords[word] = (1, idx+2 * 10)` – Yanirmr Feb 12 '20 at 08:04
  • @Yanirmr Thanks! it worked for 25 only, but when writing other numbers as text like 63 I get 27 – Hamad Feb 12 '20 at 08:11
  • @Hyyudu I've tried doing what you suggested, and I still didn't solve it.. – Hamad Feb 12 '20 at 08:13
  • So please, give us a basic translation. The work with Google translate is too much overhead for me. – Yanirmr Feb 12 '20 at 08:13
  • I'll explain the lists, the first which is units are numbers from 1 to 19 written in Arabic text. The tens list are tens written in Arabic, like ten twenty thirty... etc. The final list which is scales are number scales written in Arabic like hundred thousand million... etc. So, for the print in last line I simply wrote 25 as a text in Arabic @Yanirmr – Hamad Feb 12 '20 at 08:17
  • @Hamad : sorry, but is doesn't help yet. Can you replace any word in Arabic to the English word in your code above? thx. – Yanirmr Feb 12 '20 at 08:32
  • 1
    @Yanirmr apologize for that, you can check this solution. I used his code. https://stackoverflow.com/a/493788/12315034 – Hamad Feb 12 '20 at 08:47
  • @Hamad - thanks for the ref. I think you should edit your question. It's doesn't clear from the q that you reuse code from another discussion and it's very helpful to understand your specific problem. – Yanirmr Feb 12 '20 at 08:57
  • @Yanirmr That's true! thanks – Hamad Feb 12 '20 at 09:00

2 Answers2

1

Try changing ur tens variable as such

tens = ["", "", 
            "عشرون", "ثلاثون", "أربعون", "خمسون", "ستون", "سبعون", "ثمانون",                    
            "تسعون"  ]

That is adding 2 empty strings, alternatively, you could change this line as such:

for idx, word in enumerate(tens):     numwords[word] = (1, (idx + 2) * 10)

as someone suggested in the comments, only add the parentheses around idx+2

Rotem Tal
  • 739
  • 5
  • 11
  • Thanks for your replay. I've tried doing what you suggested but it's still not working as expected. Also, the tens are working kind of fine, but for the scales it's not working at all – Hamad Feb 12 '20 at 10:10
  • 1
    using my fix the command `print(text2int("خمسة و عشرون مليون اثنان"))` resulted in `25000002` which is correct, notice that you have units such as `"ثلاثة عشر"` which contains a space char, since you split the input based on spaces, you would only read the first half and throw an exception, this requires some patch to your algorithm, but other than that this fix should work fine – Rotem Tal Feb 12 '20 at 11:39
1

Just do below changes in your code:

for idx, word in enumerate(tens):
    numwords[word] = (1, (idx+2) * 10)
Vaibhav Jadhav
  • 2,020
  • 1
  • 7
  • 20