0

I have a pandas DataFrame a column of which I want it to store numeric vectors. I can easily do that. But If I want to serialize that to a file and then retrieve it back it becomes quite messy

Here is a snippet similar to my code

import pandas as pd
import numpy as np

df = pd.DataFrame(columns=['vector', 'other_col'])
for _ in range(1,10):
    df.loc[len(df), 'vector'] = np.random.rand(2000)
df.to_csv('example.csv', index=False)

data = pd.read_csv('example.csv')

the data will look like this

                                              vector  other_col
0  [ 0.44182594  0.38653563  0.55276495 ...,  0.6...        NaN
1  [ 0.15619965  0.97775275  0.6904491  ...,  0.2...        NaN
2  [ 0.80848747  0.66653121  0.37620277 ...,  0.5...        NaN
3  [ 0.41350165  0.40033263  0.39881338 ...,  0.3...        NaN
4  [ 0.17602205  0.54945447  0.49621991 ...,  0.6...        NaN
5  [ 0.75765499  0.09553434  0.14637461 ...,  0.2...        NaN

as you can see instead of the vectors what gets stored to the file is the actual string that you would see in your stdout if you tried to print the content of the dataframe

I have some workarounds in mind, I am just curious whether it's feasible to have that particular solution

LetsPlayYahtzee
  • 7,161
  • 12
  • 41
  • 65
  • Please show code where you believe that you get dots instead of values, this is probably a display thing and will not happen when writing to a real file. For instance if you did `to_string()` you will see all values – EdChum Mar 04 '16 at 15:22
  • I think you need [pickle IO](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_pickle.html) - [see](http://stackoverflow.com/a/35748044/2901002) – jezrael Mar 04 '16 at 15:23
  • Add code and data that reproduces this error that others can reproduce – EdChum Mar 04 '16 at 16:07
  • @EdChum, please let me know if you think that it's still unclear what the problem is – LetsPlayYahtzee Mar 04 '16 at 16:26
  • OK, this maybe some kind of limitation or bug, it seems to be converting the array to a string for some reason which I don't understand. I'd expect `np.savetxt` to work but generally storing non-scalar values as data elements is not a good idea even if this did work – EdChum Mar 04 '16 at 16:33
  • How about using `df.to_pickle(filename)`? -- http://stackoverflow.com/a/17098736/478237 – hruske Mar 04 '16 at 16:33
  • I too think that it's a bad idea what I am trying to, It is just tempting to give it a chance when you have labeled data and want to use them later for feature extraction for which you don't know the features yet. `df.to_pickle` does not a look a good candidate when you want to have some In-dependency between the data and the tools that you use to manipulate them – LetsPlayYahtzee Mar 04 '16 at 16:39

0 Answers0