14

I'm looking for the easiest way to convert all non-numeric data (including blanks) in Python to zeros. Taking the following for example:

someData = [[1.0,4,'7',-50],['8 bananas','text','',12.5644]]

I would like the output to be as follows:

desiredData = [[1.0,4,7,-50],[0,0,0,12.5644]]

So '7' should be 7, but '8 bananas' should be converted to 0.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user1882017
  • 163
  • 1
  • 2
  • 8
  • And for numeric types you do not want the type to change , i mean like int to convert to float or vice versa , it would be easier if you were aiming for a single type (rather than numeric types) . – Anand S Kumar Sep 20 '15 at 14:51

9 Answers9

13
import numbers
def mapped(x):
    if isinstance(x,numbers.Number):
        return x
    for tpe in (int, float):
        try:
            return tpe(x)
        except ValueError:
            continue
    return 0
for sub  in someData:
    sub[:] = map(mapped,sub)

print(someData)
[[1.0, 4, 7, -50], [0, 0, 0, 12.5644]]

It will work for different numeric types:

In [4]: from decimal import Decimal

In [5]: someData = [[1.0,4,'7',-50 ,"99", Decimal("1.5")],["foobar",'8 bananas','text','',12.5644]]

In [6]: for sub in someData:
   ...:         sub[:] = map(mapped,sub)
   ...:     

In [7]: someData
Out[7]: [[1.0, 4, 7, -50, 99, Decimal('1.5')], [0, 0, 0, 0, 12.5644]]

if isinstance(x,numbers.Number) catches subelements that are already floats, ints etc.. if it is not a numeric type we first try casting to int then to float, if none of those are successful we simply return 0.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
5

Another solution using regular expressions

import re

def toNumber(e):
    if type(e) != str:
        return e
    if re.match("^-?\d+?\.\d+?$", e):
        return float(e)
    if re.match("^-?\d+?$", e):
        return int(e)
    return 0

someData = [[1.0,4,'7',-50],['8 bananas','text','',12.5644]]
someData = [map(toNumber, list) for list in someData]
print(someData)

you get:

[[1.0, 4, 7, -50], [0, 0, 0, 12.5644]]

Note It don't works for numbers in scientific notation

Jose Ricardo Bustos M.
  • 8,016
  • 6
  • 40
  • 62
1

As an alternative, you can use the decimal module within a nested list comprehension:

>>> [[Decimal(i) if (isinstance(i,str) and i.isdigit()) or isinstance(i,(int,float)) else 0 for i in j] for j in someData]
[[Decimal('1'), Decimal('4'), Decimal('7'), Decimal('-50')], [0, 0, 0, Decimal('12.56439999999999912461134954')]]

Note that the advantage of Decimal is that under the first condition you can use it to get a decimal value for a digit string and a float representation for a float and integer for int:

>>> Decimal('7')+3
Decimal('10')
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

Integers, floats, and negative numbers in quotes are fine:

 def is_number(s):
        try:
            float(s)
            return True
        except ValueError:
            return False

def is_int(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

someData = [[1.0,4,'7',-50, '12.333', '-90'],['-333.90','8 bananas','text','',12.5644]]

 for l in someData:
        for i, el in enumerate(l):
            if isinstance(el, str) and not is_number(el):

                l[i] = 0
           elif isinstance(el, str) and is_int(el):

                l[i] = int(el)
           elif isinstance(el, str) and is_number(el):

                l[i] = float(el)

print(someData)

Output:

[[1.0, 4, 7, -50, 12.333, -90], [-333.9, 0, 0, 0, 12.5644]]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
LetzerWille
  • 5,355
  • 4
  • 23
  • 26
1

Considering you need both int and float data types, you should try the following code:

desired_data = []
for sub_list in someData:
    desired_sublist = []
    for element in sub_list:
        try:
            some_element = eval(element)
            desired_sublist.append(some_element)
        except:
            desired_sublist.append(0)
    desired_data.append(desired_sublist) 

This might not be the optimal way to do it, but still it does the job that you asked for.

Aswin Murugesh
  • 10,831
  • 10
  • 40
  • 69
1
lists = [[1.0,4,'7',-50], ['1', 4.0, 'banana', 3, "12.6432"]]
nlists = []
for lst in lists:
    nlst = []
    for e in lst:
        # Check if number can be a float
        if '.' in str(e):
            try:
                n = float(e)
            except ValueError:
                n = 0
        else:
            try:
                n = int(e)
            except ValueError:
                n = 0

        nlst.append(n)
    nlists.append(nlst)

print(nlists)
Kartik Anand
  • 4,513
  • 5
  • 41
  • 72
1

Not surprisingly, Python has a way to check if something is a number:

import collections
import numbers
def num(x):
    try:
        return int(x)
    except ValueError:
        try:
            return float(x)
        except ValueError:
            return 0

def zeronize(data):
    return [zeronize(x) if isinstance(x, collections.Sequence) and not isinstance(x, basestring) else num(x) for x in data]

someData = [[1.0,4,'7',-50],['8 bananas','text','',12.5644]]
desiredData = zeronize(someData)


desiredData = `[[1, 4, 7, -50], [0, 0, 0, 12]]`

A function is defined in case you have nested lists of arbitrary depth. If using Python 3.x, replace basestring with str.

This this and this question may be relevant. Also, this and this.

Community
  • 1
  • 1
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

A one-liner:

import re
result = [[0 if not re.match("^(\d+(\.\d*)?)$|^(\.\d+)$", str(s)) else float(str(s)) if not str(s).isdigit() else int(str(s)) for s in xs] for xs in somedata]
>>> result
[[1.0, 4, 7, 0], [0, 0, 0, 12.5644]]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pankaj Singhal
  • 15,283
  • 9
  • 47
  • 86
0

I assume the blanks you are referring to are empty strings. Since you want to convert all strings, regardless of them containing characters or not. We can simply check if the type of an object is a string. If it is, we can convert it to the integer 0.

cleaned_data = []
for array in someData:
    for item in array:
        cleaned_data.append(0 if type(item) == str else item)

>>>cleaned_data
[1.0, 4, 0, -50, 0, 0, 0, 12.5644]
Marc Foley
  • 19
  • 2