1

I have a list of list of strings from which I want to convert numbers into text equivalents. eg. 2 to two

This is what result looks like:

[
    ['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion'],
    ['un', 'secretari', 'gener', 'kofi', 'annan', 'appoint', 'special', 'repres', 'iraq', 'help', 'improv', 'commun', 'iraqi', 'leader'],
    ['year', '2016']
]

Here is my code:

from num2words import num2words

result=[]
with open("./Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

temp=[]

for item in result:
    r=num2words(item)
    temp.append(r)

However, this gives me an error which says:

TypeError: type(['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']) not in [long, int, float]
Will
  • 24,082
  • 14
  • 97
  • 108
minks
  • 2,859
  • 4
  • 21
  • 29
  • 1
    I don't see any numbers in the `result` you provided... – Will Jan 30 '16 at 04:36
  • And is your input always a list of lists, or can list items be just a simple string as well? – Will Jan 30 '16 at 04:37
  • @Will: There are some numbers in the entire *result*, I just put some examples to show what *result* looks like. The input is always a list of list. – minks Jan 30 '16 at 04:40
  • `result` is a list of lists. I don't know how the api works but my guess is you want `result.extend(line.strip().split())`. – tdelaney Jan 30 '16 at 04:41
  • see error message - `num2words` expects only numbers `long`, `int`, `float` – furas Jan 30 '16 at 04:41
  • So could I add in an if-else for it to detect if a number, then perform num2words else skip? – minks Jan 30 '16 at 04:42
  • It looks like the API raises an exception, so do a try/except block. – tdelaney Jan 30 '16 at 04:43
  • 1
    @CoderQueen ok cool, check my answer. This should work for actual `int`s/`float`s or numeric-strings like `"22"`. – Will Jan 30 '16 at 04:56

3 Answers3

2

You have a list of lists, not a list of strs. This would be a naive approach:

from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

result = [[
    num2words(subitem) if isinstance(subitem, (int, float, long)) else subitem for subitem in item
] for item in result]

This is a nested list comprehension; see here for more information about how those work.

Now, this still has a problem! If I have the string '22', our isinstance() check fails! So we might need some additional logic, with the help of isdigit():

def digitsToWords(item):
    if isinstance(item, (int, float, long)):
        return num2words(item)

    if isinstance(item, (str, unicode)):
        if item.isdigit():
            return num2words(int(item))

        if item.replace('.', '', 1).isdigit():
            return num2words(float(item))

    return item

result = [[digitsToWords(subitem) for subitem in item] for item in result]

If you don't want to convert floats to words, do this instead:

def digitsToWords(item):
    if isinstance(item, (int, long)):
        return num2words(item)

    if isinstance(item, (str, unicode)) and item.isdigit():
        return num2words(int(item))

    return item

result = [[digitsToWords(subitem) for subitem in item] for item in result]
Community
  • 1
  • 1
Will
  • 24,082
  • 14
  • 97
  • 108
2

Firstly try to create a list result that is flattened i.e. no nested list inside it if any. Then use evaluation of the list item if it is number (int or long using isdigit() function) and use literal_eval before passing to the function num2words since num2words expects int not str .

from num2words import num2words
from ast import literal_eval

result = []
with open("/Users/mr/Documents/Stemmingg.txt",'r') as filer:
    for line in filer:
        lst = line.strip().split()#split every line by spaces
        for item in lst:
            result.append(item.strip())#creating flattened list by appending item one by one

temp=[]     
for item in result:
    if item.isdigit():#check if int of long but not float
        r=num2words(literal_eval(item))#using literal_eval to convert string to number
        temp.append(r)
    else:
        pass
print temp

N.B.If you want to keep every other words then change

This

else:
       pass 

To

else:
      temp.append(item)
Learner
  • 5,192
  • 1
  • 24
  • 36
  • This isn't giving me any output. It isn't printing *tmp* also. Just hangs. – minks Jan 30 '16 at 04:54
  • I still do not receive any output. Here is what the file looks like: *fall demand oil asia help cut averag price gallon regular gasolin unit state* *$1.08 $1.20 last month* *specif address dozen protest* *part tell asian turmoil affect european economi* *take 90 day* They are all different sentences. – minks Jan 30 '16 at 05:03
  • The you need all number in the file like `*$1.08 $1.20` and `90 ` to be converted into spelling? – Learner Jan 30 '16 at 05:06
  • Not the floats. Just the integers. It ran now but what I am receiving by printing first few *temp* : [u'three', u'zero', u'one', u'nine', u'nine', u'zero', u'one', u'zero', u'eight', u'one'] It seems to have kept only the converted text and removed every other word. – minks Jan 30 '16 at 05:08
  • OK edited and checked finally- now check. Now it keeps every other words – Learner Jan 30 '16 at 05:36
-1

The reason for that specific error is because your array of results is actually an array of arrays.

So saying something like

for item in result:
    r=num2words(item)

item will actually be

['nn', 'known', 'tsutsumi', 'father', 'yasujiro', 'sow', 'seed', 'family', 'dominion']

Your options for that are either to flatten it into a single dimensional array or to have a nested for loop, like so (or use a nested list comprehension, as answered above):

for arr in result:
    for item in arr: 
        r=num2words(item)

However, you still have a problem--num2words must take a number. None of your items are actually numbers (they're all strings). Since you're parsing from a file, you should probably try to cast to an int, and only convert it if it works. So the code would look something like:

from num2words import num2words
result=[]
with open("/Users/mr/Documents/Stemmingg.txt") as filer:
    for line in filer:
        result.append(line.strip().split())

temp=[]
for arr in result:
    for item in arr: 
        try:
            r=num2words(int(item))
            temp.append(r)
        except:
            pass
Schiem
  • 589
  • 3
  • 12