2

i-m trying to implement a function that convert numbers from Western numbers to Mandarin, I am having issues with REGEX.

This is the logic: Numbers in Mandarin follow 3 simple rules.

There are words for each of the digits from 0 to 10.

For numbers 11-19, the number is pronounced as "ten digit", so for example, 16 would be pronounced (using Mandarin) as "ten six".

For numbers between 20 and 99, the number is pronounced as “digit ten digit”, so for example, 37 would be pronounced (using Mandarin) as "three ten seven". If the digit is a zero, it is not included.

Everything is OK from 20 to 89, with mean is not matching the pattern [11-19] but for some reason from 90 on it match the pattern [11-90]

Here is the code so you can understand what i'm saying.

import re
def convert_to_mandarin(us_num):
    '''
    us_num, a string representing a US number 0 to 99
    returns the string mandarin representation of us_num
    '''
    ans = None
    numbers0to9 = re.compile('\d')
    numbers11to19 = re.compile('[11-19]')

    trans = {'0':'ling', '1':'yi', '2':'er', '3':'san', '4': 'si',
          '5':'wu', '6':'liu', '7':'qi', '8':'ba', '9':'jiu', '10': 'shi'}
    if len(us_num) == 1:
        if numbers0to9.match(us_num):
            ans = trans[us_num]
            print(us_num)
    else:
        if numbers11to19.match(us_num) and len(us_num) == 2:
            ans = trans['10'] +' ' + trans[us_num[1]]    
            print(us_num)
        elif us_num[1] == '0' and len(us_num) == 2:
            ans = trans[us_num[0]] + ' ' + trans['10']
            print(us_num)
        else:
            ans = trans[us_num[0]] + ' ' + trans['10'] +' ' + trans[us_num[1]]    
            print(us_num)
    return ans
print(convert_to_mandarin(str('3')))
print(convert_to_mandarin(str('15')))
print(convert_to_mandarin(str('71')))
print(convert_to_mandarin(str('81')))
print(convert_to_mandarin(str('91')))

So, why is every use case of 20-99 not matching numbers11to19 pattern and entering the intended branch and from 90 on it match that pattern?

Thanks in advance!

  • 1
    Because square brackets mean a character group `[11-19]` here means a group with the characters `1` and `9`. For `[20-99]` it matches all digits `0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8` and `9`. – Willem Van Onsem Feb 04 '18 at 16:20

1 Answers1

1

The smallest fix would be to use this regular expression for 11-19 instead:

numbers11to19 = re.compile('1[1-9]')

Regular expressions are not sophisticated enough to recognize numbers--they operate only on digits, so they cannot work with a range of multiple-digit numbers. However, this case happens to allow looking at the digits individually: an input string matches if the first digit is 1 and the second digit is between 1 and 9.

lehiester
  • 836
  • 2
  • 7
  • 17