3

I have a pandas dataframe df whose elements are each a whole numpy array. For example the 6th row of column 'x_grid':

>>> e = df.loc[6,'x_grid']
>>> print(e)

[-11.52616579 -11.48006112 -11.43395646 -11.3878518  -11.34174713
 -11.29564247 -11.24953781 -11.20343315 -11.15732848 -11.11122382
 -11.06511916 -11.01901449 ...

But I cannot use this as a numpy array as it is just given as a string:

>>> print(type(e))

<class 'str'>

How can I store a numpy array to a dataframe so it does not get converted to a string? Or convert this string back to a numpy array in a nice way?

Jack Wetherell
  • 565
  • 3
  • 8
  • 16
  • It's worth noting that this DataFrame is loaded from a csv file, which is no doubt where the conversion to string happens. So I guess converting this string back to a numpy array would be the easier route. – Jack Wetherell Apr 04 '19 at 15:50
  • 1
    Plus, there are no commas to seperate the elements in your array. – Erfan Apr 04 '19 at 15:52
  • 1
    Look at the source text file. This array is a quoted string, complete with`[]`, Are there also `...`? The original dataframe had these array items, and the only way to save such a df to a 2d csv format is turn the complex items into strings. pandas used `str(item)`. Where possible avoid saving such dataframes as csv. – hpaulj Apr 04 '19 at 15:57
  • 1
    This has come up a number times, e.g. https://stackoverflow.com/questions/51898099/convert-a-string-with-brackets-to-numpy-array. `literal_eval` might have problems with your string because it is missing the commas that normally mark a list. – hpaulj Apr 04 '19 at 16:42

3 Answers3

0

If you just want to convert all those strings in each row into a list the following will work:

df['x_grid'].str[1:-1].str.split(" ").apply(lambda x: (list(map(float, x))))

# or for a numpy array
df['x_grid'].str[1:-1].str.split(" ").apply(lambda x: (np.array(list(map(float, x)))))

Hope that helps.

Rafal Janik
  • 289
  • 1
  • 6
0

Thanks to Erfan and hpaulj for the suggestions that combined to answer this question.

The solution is that when setting an element of the dataframe I first convert the numpy array x to a list (so it is comma separated not space separated):

df = df.append({'x_grid': list(x)}, ignore_index=True)

Then after saving to a csv, and loading back in, I extract it back into a numpy array using np.array() and ast.literal_eval() (Note: requires import ast):

x = np.array(ast.literal_eval(df.loc[entry,'x_grid']))

This then returns a correct numpy array x.

Jack Wetherell
  • 565
  • 3
  • 8
  • 16
0

Want to extend Rafal's answer to avoid numpy throwing exception from empty strings resulting from the x.split:

df['x_grid'].str[1:-1].apply(lambda x: list(filter(None,x.split(' ')))).apply(lambda x: np.array(x).astype(np.float))