1

I wish to store a dataframe as csv file and then read from it while preserving the datatype of each entry. I am able to do it with str, int and float but when it comes to np.arrays, the way it is stored is as string. Is there a way to preserve its nature?

Ex.:

import pandas as pd
import numpy as np
pd_df = pd.DataFrame(columns=['name', 'int', 'float', 'array'])
pd_df.loc[0] = pd.Series({ 'name'  : 'one', 'int' : 1,
                           'float': 1.0, 
                           'array' : np.random.rand(1,3) })
pd_df.to_csv( 'file.csv' )

Then I read "file.csv" in a second moment with the built-in "read_csv":

read_file = pd.read_csv( 'file.csv' )
print( read_file )                       #returns desired result
print( type(read_file.loc[0]['name'])  ) #returns <class 'str'>
print( type(read_file.loc[0]['int'])   ) #returns <class 'numpy.int64'>
print( type(read_file.loc[0]['float']) ) #returns <class 'numpy.float64'>
print( type(read_file.loc[0]['array']) ) #returns <class 'str'> !!!

Of course I can transform the read_file.loc[0]['array'] back to a np.array, but I am wondering if there is a way to keep the array the way it is in the dataframe and into the csv. I have try to specify the datatype of each column with apply , and to read it with a specific dtype, as well as to use as_matrix() as suggested here but could not get to make it work.

Thnak you for any suggestion.

Sebastien D
  • 4,369
  • 4
  • 18
  • 46
Marco Di Gennaro
  • 395
  • 1
  • 3
  • 15

0 Answers0