I have a pandas DataFrame a column of which I want it to store numeric vectors. I can easily do that. But If I want to serialize that to a file and then retrieve it back it becomes quite messy
Here is a snippet similar to my code
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['vector', 'other_col'])
for _ in range(1,10):
df.loc[len(df), 'vector'] = np.random.rand(2000)
df.to_csv('example.csv', index=False)
data = pd.read_csv('example.csv')
the data will look like this
vector other_col
0 [ 0.44182594 0.38653563 0.55276495 ..., 0.6... NaN
1 [ 0.15619965 0.97775275 0.6904491 ..., 0.2... NaN
2 [ 0.80848747 0.66653121 0.37620277 ..., 0.5... NaN
3 [ 0.41350165 0.40033263 0.39881338 ..., 0.3... NaN
4 [ 0.17602205 0.54945447 0.49621991 ..., 0.6... NaN
5 [ 0.75765499 0.09553434 0.14637461 ..., 0.2... NaN
as you can see instead of the vectors what gets stored to the file is the actual string that you would see in your stdout if you tried to print the content of the dataframe
I have some workarounds in mind, I am just curious whether it's feasible to have that particular solution