I wish to store a dataframe as csv file and then read from it while preserving the datatype of each entry. I am able to do it with str, int and float but when it comes to np.arrays, the way it is stored is as string. Is there a way to preserve its nature?
Ex.:
import pandas as pd
import numpy as np
pd_df = pd.DataFrame(columns=['name', 'int', 'float', 'array'])
pd_df.loc[0] = pd.Series({ 'name' : 'one', 'int' : 1,
'float': 1.0,
'array' : np.random.rand(1,3) })
pd_df.to_csv( 'file.csv' )
Then I read "file.csv" in a second moment with the built-in "read_csv":
read_file = pd.read_csv( 'file.csv' )
print( read_file ) #returns desired result
print( type(read_file.loc[0]['name']) ) #returns <class 'str'>
print( type(read_file.loc[0]['int']) ) #returns <class 'numpy.int64'>
print( type(read_file.loc[0]['float']) ) #returns <class 'numpy.float64'>
print( type(read_file.loc[0]['array']) ) #returns <class 'str'> !!!
Of course I can transform the read_file.loc[0]['array']
back to a np.array, but I am wondering if there is a way to keep the array the way it is in the dataframe and into the csv.
I have try to specify the datatype of each column with apply
, and to read it with a specific dtype
, as well as to use as_matrix()
as suggested here but could not get to make it work.
Thnak you for any suggestion.