1

I have a ndarray which looks like this:

x

data ndarray

I wanted to add this into an existing dataframe so that I could export it as a csv, and then use that csv in a separate python script, pull out the ndarray and carry out some analysis, mainly so that I don't have one really long python script.

To add it to a dataframe I've done the following:

data["StandardisedFeatures"] = x.tolist()

dataframe

This looks ok to me. However, in my next script, when I try to pull out the data and put it back as an array, it doesn't appear the same, it's wrapped in single quotes and treating it as a string:

data['StandardisedFeatures'].to_numpy()

not an array

I've tried astype(float) but it doesn't seem to work, can anyone suggest a way to fix this?

Thanks.

MrPaul91
  • 57
  • 5

3 Answers3

1

If your list objects in a DataFrame have become strings while processing (happens sometimes), you can use eval or ast.literal_eval functions to convert back from string to list, and use map to do it for every element.

Here is an example which will give you an idea of how to deal with this:

import pandas as pd
import numpy as np

dic = {"a": [1,2,3], "b":[4,5,6], "c": [[1,2,3], [4,5,6], [1,2,3]]}
df = pd.DataFrame(dic)

print("DataFrame:", df, sep="\n", end="\n\n")

print("Column of list to numpy:", df.c.to_numpy(), sep="\n", end="\n\n")
temp = df.c.astype(str).to_numpy()

print("Since your list objects have somehow become str objects while working with df:", temp, sep="\n", end="\n\n")

print("Magic for what you want:", np.array(list(map(eval, temp))), sep="\n", end="\n\n")

Output:

DataFrame:
a  b          c
0  1  4  [1, 2, 3]
1  2  5  [4, 5, 6]
2  3  6  [1, 2, 3]

Column of list to numpy:
[list([1, 2, 3]) list([4, 5, 6]) list([1, 2, 3])]

Since your list objects have somehow become str objects while working with df:
['[1, 2, 3]' '[4, 5, 6]' '[1, 2, 3]']

Magic for what you want:
[[1 2 3]
[4 5 6]
[1 2 3]]

Note: I have used eval in the example only because more people are familiar with it. You should prefer using ast.literal_eval instead whenever you need eval. This SO post nicely explains why you should do this.

Shubham
  • 1,310
  • 4
  • 13
0

You can save objects of any type in a DataFrame.

You retain their type, but they will be classified as "object" in the pandas.DataFrame.info().

Example: save lists

df = pd.DataFrame(dict(my_list=[[1,2,3,4], [1,2,3,4]]))
print(type(df.loc[0, 'my_list']))
# Print: list

This is useful if you use your objects directly with pandas.DataFrame.apply().

Florian Fasmeyer
  • 795
  • 5
  • 18
0

Perhaps an alternative and simpler way of solving this issue is to use numpy.save and numpy.load functions. Then you can save the array as a numpy array object and load it again in the next script directly as a numpy array:

import numpy as np
x = np.array([[1, 2], [3, 4]])
# Save the array in the working directory as "x.npy" (extension is automatically inserted)
np.save("x", x)
# Load "x.npy" as a numpy array
x_loaded = np.load("x.npy")
Jay
  • 95
  • 8