Save a pandas dataframe containing numpy arrays

Question

I have a dataframe with a column full of numpy arrays.

    A     B         C
0   1.0   0.000000  [[0. 1.],[0. 1.]]
1   2.0   0.000000  [[85. 1.],[52. 0.]]
2   3.0   0.000000  [[5. 1.],[0. 0.]]
3   1.0   3.333333  [[0. 1.],[41. 0.]]
4   2.0   3.333333  [[85. 1.],[0. 21.]]

Problem is, when I save it as a CSV file, and when i load it on another python file, the numpy column is read as text.

I tried to transform the column with np.fromstring() or np.loadtxt() but it doesn't work.

Example of and array after pd.read_csv()

"[[ 85.  1.]\n [   52.            0.        ]]"

Thanks

Did you consider saving it in another format than csv, such as feather, parquet, or HDF? — Adrien Pacifico, Jul 28 '22 at 12:36
Yes I did, and it does work. But I wanted to know if there is another way, admitting that I want it to be humanly readable when saved as CSV. — Mrofsnart, Jul 28 '22 at 12:38
In short, you cannot, but you could provide a short function to perform the conversion — mozway, Jul 28 '22 at 12:44
I would strongly advise against having np.array or any other objects inside dataframe, more so when you want to save them in csv type. Otherwise, you need to encode/decode your arrays to/from strings as @mozway. If `np.fromstring()` doesn't work for you, you can write your own function. — Quang Hoang, Jul 28 '22 at 12:51

xLaszlo · Answer 1 · 2022-07-28T12:56:15.677

1

You can try .to_json()

output = pd.DataFrame([
  {'a':1,'b':np.arange(4)},
  {'a':2,'b':np.arange(5)}
]).to_json()

But you will get only lists back when reloading with

df=pd.read_json()

Turn them to numpy arrays with:

df['b']=[np.array(v) for v in df['b']]

edited Jul 28 '22 at 12:56

answered Jul 28 '22 at 12:55

xLaszlo

11
2

score 1 · Answer 2 · answered Jul 28 '22 at 15:03

The code below should work. I used another question to solve it, theres a bit more explanation in there: Convert a string with brackets to numpy array

import pandas as pd
import numpy as np

from ast import literal_eval

# Recreating DataFrame
data = np.array([0, 1, 0, 1, 85, 1, 52, 0, 5, 1, 0, 0, 0, 1, 41, 0, 85, 1, 0, 21], dtype='float')
data = data.reshape((5,2,2))

write_df = pd.DataFrame({'A': [1.0,2.0,3.0,1.0,2.0],
                   'B': [0,0,0,3+1/3,3+1/3],
                   'C': data.tolist()})

# Saving DataFrame to CSV
fpath = 'D:\\Data\\test.csv'
write_df.to_csv(fpath)

# Reading DataFrame from CSV
read_df = pd.read_csv(fpath)

# literal_eval converts the string to a list of tuples
# np.array can convert this list of tuples directly into an array
def makeArray(rawdata):
    string = literal_eval(rawdata)
    return np.array(string)

# Applying the function row-wise, there could be a more efficient way
read_df['C'] = read_df['C'].apply(lambda x: makeArray(x))

score 0 · Answer 3 · answered Jul 28 '22 at 15:26

Here is an ugly solution.

import pandas as pd
import numpy as np

### Create dataframe
a = [1.0, 2.0, 3.0, 1.0, 2.0]
b = [0.000000,0.000000,0.000000,3.333333,3.333333]
c = [np.array([[0. ,1.],[0. ,1.]]),
np.array([[85. ,1.2],[52. ,0.]]),
np.array([[5. ,1.],[0. ,0.]]),
np.array([[0. ,1.],[41. ,0.]]),
np.array([[85. ,1.],[0. ,21.]]),]


df = pd.DataFrame({"a":a,"b":b,"c":c})

#### Save to csv

df.to_csv("to_trash.csv")
df = pd.read_csv("to_trash.csv")

### Bad string manipulation that could be done better with regex

df["c"] = ("np.array("+(df
 .c
 .str.split()
 .str.join(' ')
 .str.replace(" ",",")
 .str.replace(",,",",")
 .str.replace("[,", "[", regex=False)
)+")").apply(lambda x: eval(x))

score 0 · Answer 4 · answered Oct 27 '22 at 12:53

The best solution I found is using Pickle files.

You can save your dataframe as a pickle file.

import pickle
img = cv2.imread('img1.jpg')
data = pd.DataFrame({'img':img})

data.to_pickle('dataset.pkl')

Then you can read is as pickle file:

with (open(ref_path + 'dataset.pkl', "rb")) as openfile:
     df_file = pickle.load(openfile)

Let me know if it worked.

Save a pandas dataframe containing numpy arrays

4 Answers4