1

I am using pandas and numpy do feature extraction. It take a long time to complete this task so I want to save DataFrame for later use.

I write a large pandas.Dataframe which contains multiple 2-d numpy array into a csv file. These cell value like this:

        color                 contrast            dissimilarity  \
0  134.000000                  [[0.0]]                  [[0.0]]   
1  135.133333  [[0.16000000000000003]]  [[0.16000000000000003]]

Then I read data from the csv file, the format of float number changed like this:

    color  contrast dissimilarity  
0  134.00    [[0.]]        [[0.]]       
1  135.13  [[0.16]]      [[0.16]]    

The float value '0.0' become '0.' . So when I use the dataframe read from the csv file as params for my model, it raise error:

ValueError: could not convert string to float: '[[0.]]'

This is how I write df to csv file: from datetime import datetime

now = datetime.now()

current_time = now.strftime("%x%H:%M:%S")
print("Current Time =", current_time)
current_time = current_time.replace(':', '')
current_time = current_time.replace('/', '')

compression_opts = dict(method='zip', archive_name= current_time + '.csv') 
df.to_csv(current_time + 'test.zip', index=False, compression=compression_opts)  

This is how I read file

df2 = pd.read_csv('112220153048.csv', sep=',')

Is there a way that don't change number format when write data to cvs file?

Khoa Chau
  • 53
  • 1
  • 5

1 Answers1

1

The csv file is converting each element to string because it cannot recognize the brackets as numpy does. There are two solutions I can think of.

One is more hacky, and a little bit ugly. If you have to use the csv, then you could try to parse each element slicing the brackets out.

element = "[[0.1]]"
float_from_element = float(element[2:-2])
>>> 0.1

My second suggestion is to use pickle to save data instead of saving it as a csv file. This might be useful if you are only processing the dataframes on python and don't need to read the csv outside of it. The pickle package will save the dataframe or a series of chunks as a binary file that you can save on your hard drive. Then when you load the pickle file, it will load as a python object, which will conserve its properties as a pandas dataframe or numpy array.

I think pandas has native support for pickle, read this link.