0

So I have a dataframe column that includes a numpy array, but its read in as a string. So, I end up with the following as a single element:

df['numpy_arr'].iloc[0] = ' 2 3 5 23 5  2 23 '

I want to convert this to a numpy array, and have successfully done so for a single instance using numpy.fromstring:

first = df['numpy_arr'].iloc[0]
np.fromstring(first, sep=' ')

However, when I try to generalize this using the apply:

df['numpy_arr'].apply(lambda x: np.fromstring(x, sep=' '))

It returns an empty series. Why is this the case? Is it because the x in the lambda doesn't actually refer to the actual string? I also tried the following based off of this:

[np.fromstring(vector,sep=' ') for vector in df['numpy_arr']]

Which again returns empty arrays. Why is this the case? How can I generalize this to work on the whole series and convert the elements to a numpy array?

[EDIT] As a last resort, I suppose I can iterate across .iloc[x] but that seems like a very inefficient way to do this, especially since I would have to transform it back into a series

ocean800
  • 3,489
  • 13
  • 41
  • 73

1 Answers1

1

This works for me:

>>> df = pd.DataFrame({"A": [' 2 3 5 23 5 2 23 ', ' 3 4 5 ']})
>>> df
                   A
0   2 3 5 23 5 2 23
1             3 4 5
>>> df['A'].apply(lambda x: np.fromstring(x, sep = ' '))
0    [2.0, 3.0, 5.0, 23.0, 5.0, 2.0, 23.0]
1                          [3.0, 4.0, 5.0]
Name: A, dtype: object

Remember to assign the returned value to the dataframe column if you want the value to be saved:

>>> df['A'] = df['A'].apply(lambda x: np.fromstring(x, sep = ' '))
>>> df
                                       A
0  [2.0, 3.0, 5.0, 23.0, 5.0, 2.0, 23.0]
1                        [3.0, 4.0, 5.0]
>>>
Carles Mitjans
  • 4,786
  • 3
  • 19
  • 38