1

I have a column of type numpy.ndarray which looks like:

         col
    ['','','5','']
    ['','8']
    ['6','','']
    ['7']
    []
    ['5']

I want the ouput like this :

         col
          5
          8
          6
          7
          0
          5

How can I do this in python.Any help is highly appreciated.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
user4349490
  • 153
  • 1
  • 8

3 Answers3

2

To convert the data to numeric values you could use:

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5',''], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = pd.to_numeric(df['col'].str.join('')).fillna(0).astype(int)
print(df)

yields

   col
0    5
1    8
2    6
3    7
4    0
5    5

To convert the data to strings use:

df['col'] = df['col'].str.join('').replace('', '0')

The result looks the same, but the dtype of the column is object since the values are strings.


If there is more than one number in some rows and you wish to pick the largest, then you'll have to loop through each item in each row, convert each string to a numeric value and take the max:

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5','6'], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = [max([int(xi) if xi else 0 for xi in x] or [0]) for x in df['col']]
print(df)

yields

   col
0    6   # <-- note  ['','','5','6'] was converted to 6
1    8
2    6
3    7
4    0
5    5

For versions of pandas prior to 0.17, you could use df.convert_objects instead:

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5',''], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = df['col'].str.join('').replace('', '0')
df = df.convert_objects(convert_numeric=True)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Why do you need the `data = list(map(np.array, ...`? Can it be just `data = np.array(...)` – Joe T. Boka May 08 '16 at 13:04
  • I am getting "AttributeError: 'module' object has no attribute 'to_numeric' ". How to get around it. Thanks – user4349490 May 08 '16 at 13:05
  • @JoeR: If you define `data` as a NumPy object array, then `df = pd.DataFrame(data)` will make the values lists, not NumPy arrays. Since the OP said "column of type numpy.ndarray" I tried to adhere to this specification (just in case it makes a difference, though I don't think it does.) – unutbu May 08 '16 at 13:10
  • @user4349490: `pd.to_numeric` is a somewhat recent addition to pandas. In earlier versions there was a a [`df.convert_objects` method](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html). If you can update your version of pandas, I'd recommend doing so. Otherwise, try `df.convert_objects(convert_numeric=True)`. – unutbu May 08 '16 at 13:13
  • Yeah, it doesn't seem to make a difference. Thank you for the great answer btw. and the great jokes on your profile page. – Joe T. Boka May 08 '16 at 13:27
0

I'll leave you with this :

>>> l=['', '5', '', '']
>>> l = [x for x in l if not len(x) == 0]
>>> l
>>> ['5']

You can do the same thing using lambda and filter

>>> l
['', '1', '']
>>> l = filter(lambda x: not len(x)==0, l)
>>> l
['1']

The next step would be iterating through the rows of the array and implementing one of these two ideas.

Someone shows how this is done here: Iterating over Numpy matrix rows to apply a function each?

edit: maybe this is down-voted, but I made it on purpose to not give the final code.

Community
  • 1
  • 1
enibundo
  • 31
  • 3
0
     xn = array([['', '', '5', ''], ['', '8'], ['6', '', ''], ['7'], [], ['5']],
    dtype=object)

        In [20]: for a in x:
   ....:     if len(a)==0:
   ....:         print 0
   ....:     else:
   ....:         for b in a:
   ....:             if b:
   ....:                 print b
   ....:
5
8
6
7
0
5
Chris Wood
  • 21
  • 5