4

I have to numpy string array which look like this:

[['0', '', '12.12', '140.65', '', ''],
['3', '', '10.45', '154.45', '', ''],
['5', '', '15.65', '184.74', '', '']]

What I need to do is to replace the empty cells with a number in order to convert it into a float array. I can't just delete the columns because in some cases the empty cells are filled. I tried this:

data = np.char.replace(data, '','0').astype(np.float64)

But this will just put a 0 everywhere between all characters which ends up in this:

[[0, 0, 1020.0102, 104000.0605, 0, 0],
[30, 0, 1000.0405, 105040.0405, 0, 0],
[50, 0, 1050.0605, 108040.0704, 0, 0]]

I can't figure out why python does that? I searched via google but couldn't find a good explanation for numpy.char.replace. Can anyone explain to me how it works?

Toggo
  • 127
  • 1
  • 7
  • 3
    Possible duplicate of [Numpy array, fill empty values for a single column](https://stackoverflow.com/questions/20512101/numpy-array-fill-empty-values-for-a-single-column) – Grzegorz Oledzki Nov 16 '17 at 09:15
  • Your 'empty' cells contain commas. Char.replace applies the regular string replace method to each element. – hpaulj Nov 16 '17 at 14:28

3 Answers3

4
>>> a = np.array([['0', '', '12.12', '140.65', '', ''],
... ['3', '', '10.45', '154.45', '', ''],
... ['5', '', '15.65', '184.74', '', '']])
>>> a[a == ''] = 0
>>> a.astype(np.float64)
array([[   0.  ,    0.  ,   12.12,  140.65,    0.  ,    0.  ],
       [   3.  ,    0.  ,   10.45,  154.45,    0.  ,    0.  ],
       [   5.  ,    0.  ,   15.65,  184.74,    0.  ,    0.  ]])
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • You might want to have empty strings in the initial array like OP – lxop Nov 16 '17 at 09:15
  • @ixop that's a copy-paste fail. Give me a second. – timgeb Nov 16 '17 at 09:15
  • 1
    Perfect, this works great! Thank you for your fast answer! – Toggo Nov 16 '17 at 09:21
  • 2
    @Toggo `replace` operates on substrings, so it more or less checks each position of each of your strings for the to replace string. Since '' matches everywhere, you will have '0' inserted everywhere. – Paul Panzer Nov 16 '17 at 09:31
  • @Paul Panzer Ok, I guessed it would be this way. Is there a possibility to make replace operate with the whole string rather than a substring? – Toggo Nov 16 '17 at 09:34
  • @Toggo `replace` itself I don't think so. I think this answer is the way to do it. – Paul Panzer Nov 16 '17 at 10:06
0

data = np.char.replace(data, '','0')

It seems to replace all empty places, like '' has one place , and '0' has two places, '12.12' has 6 places. The result is

[['000' '0' '01020.01020' '0104000.06050' '0' '0']
 ['030' '0' '01000.04050' '0105040.04050' '0' '0']
 ['050' '0' '01050.06050' '0108040.07040' '0' '0']]

Try this :

import numpy as np

a = np.array([['0', '', '12.12', '140.65', '', ''],
              ['3', '', '10.45', '154.45', '', ''],
              ['5', '', '15.65', '184.74', '', '']])

#a[np.where(a == '')] = '0'
a[a == ''] = '0'

a = a.astype(np.float64)

print(a)
0

I know that this is an old question, but unfortunately, the accepted answer does not work properly today. If you do the [a == ''] comparison you will get a FutureWarning:

FutureWarning: elementwise comparison failed; returning scalar
instead, but in the future will perform elementwise comparison

one method that will do the trick with no waring is to use the numpy.where()

   import numpy as np
   a = np.array([['0', '', '12.12', '140.65', '', ''],
               ['3', '', '10.45', '154.45', '', ''],
               ['5', '', '15.65', '184.74', '', '']])

   result = np.where(a=='', '0', a)
   print(result)

The result is

[['0' '0' '12.12' '140.65' '0' '0']  
 ['3' '0' '10.45' '154.45' '0' '0']  
 ['5' '0' '15.65' '184.74' '0' '0']]
Bogdan
  • 593
  • 7
  • 14