Python - Call numpy method on strings of dataframe column?

Question

So I have a dataframe column that includes a numpy array, but its read in as a string. So, I end up with the following as a single element:

df['numpy_arr'].iloc[0] = ' 2 3 5 23 5  2 23 '

I want to convert this to a numpy array, and have successfully done so for a single instance using numpy.fromstring:

first = df['numpy_arr'].iloc[0]
np.fromstring(first, sep=' ')

However, when I try to generalize this using the apply:

df['numpy_arr'].apply(lambda x: np.fromstring(x, sep=' '))

It returns an empty series. Why is this the case? Is it because the x in the lambda doesn't actually refer to the actual string? I also tried the following based off of this:

[np.fromstring(vector,sep=' ') for vector in df['numpy_arr']]

Which again returns empty arrays. Why is this the case? How can I generalize this to work on the whole series and convert the elements to a numpy array?

[EDIT] As a last resort, I suppose I can iterate across .iloc[x] but that seems like a very inefficient way to do this, especially since I would have to transform it back into a series

Also note, `.apply` is generally not going to be faster than iterating over the `Series`. That's essentially what it does under the hood. — juanpa.arrivillaga, Nov 16 '17 at 19:36
@juanpa.arrivillaga ah, thats so odd let me check my example! But good to know about apply — ocean800, Nov 16 '17 at 19:39
I think the problem come from bad import. what is your source file ? — B. M., Nov 16 '17 at 19:54

score 1 · Accepted Answer · answered Nov 16 '17 at 19:31

This works for me:

>>> df = pd.DataFrame({"A": [' 2 3 5 23 5 2 23 ', ' 3 4 5 ']})
>>> df
                   A
0   2 3 5 23 5 2 23
1             3 4 5
>>> df['A'].apply(lambda x: np.fromstring(x, sep = ' '))
0    [2.0, 3.0, 5.0, 23.0, 5.0, 2.0, 23.0]
1                          [3.0, 4.0, 5.0]
Name: A, dtype: object

Remember to assign the returned value to the dataframe column if you want the value to be saved:

>>> df['A'] = df['A'].apply(lambda x: np.fromstring(x, sep = ' '))
>>> df
                                       A
0  [2.0, 3.0, 5.0, 23.0, 5.0, 2.0, 23.0]
1                        [3.0, 4.0, 5.0]
>>>

Python - Call numpy method on strings of dataframe column?

1 Answers1