Save Pandas df containing long list as csv file

Question

I am trying to save a pandas dataframe as .csv file. Currently my code looks like this:

with open('File.csv', 'a') as f:
        df.to_csv(f, header=False)

The saving works but the problem is that the lists in my dataframe are just compressed to [first,second,...,last] and all the entries in the middle are discarded. If I just look at the original dataframe all entries are there. Is there any way how I can convert the list to a string which contains all the elements (str(df) also discards the middle elements) or how I can save a full numpy array in a cell of a csv table?

Thank you for your help, Viviane

I don't see any reason you the code you have doesn't work. I did notice that you are appending to the file rather than creating a new file. Are you sure the file didn't already have the data as you describe and if you look at the end of that file it has the entire dataframe? — Ryan, Dec 17 '17 at 18:12
Yes, for my purpose I need to append and no, it is not already at the end, sorry — thebear, Dec 17 '17 at 19:33
No, `df.to_csv('File.csv', header=False, mode='a')` should write __all__ values into the file. Could you produce a __reproducible__ code? — MaxU - stand with Ukraine, Dec 17 '17 at 19:44
`df = pd.DataFrame(columns=['ID','A']) hot = np.zeros(9999) ID = 1 df=df.append({'ID': int(ID),'A':hot}, ignore_index=True) with open('test.csv', 'a') as f: df.to_csv(f, header=False)` — thebear, Dec 17 '17 at 20:08

score 5 · Answer 1 · answered Sep 17 '18 at 14:01

I had issues while saving dataframes too. I had a dataframe in which some columns consisted of lists as its elements. When I saved the datfarme using df.to_csv and then read it from disk using df.read_csv, the list and arrays were turned into a string of characters. Hence [1,2,3] was transformed to '[1,2,3]'. When I used HDF5 format the problem was solved.

If you dataframe is called df_temp, then you can use:

store = pd.HDFStore('store.h5')
store['df'] = df_temp

to save the dataframe in HDF5 format and you can read it using the following command:

store = pd.HDFStore('store.h5')
df_temp_read = store['df']

You can look at this answer. I should also mention that pickle did not work for me, since I lost the column names when reading from the file. Maybe I did something wrong, but apart from that, pickle can cause compatibility issues if you plan to read the file in different python versions.

This should probably be the selected answer. Works with dataframes containing multiple lists of multiple data types — ClonedOne, Oct 05 '20 at 22:27

MaxU - stand with Ukraine · Answer 2 · 2017-12-17T20:22:28.993

3

Your code should work properly. I couldn't reproduce described behavior.

Here is a bit more "pandaic" version:

df.to_csv('File.csv', header=False, mode='a')

PS pay attention at the mode='a' (append) parameter...

UPDATE:

How to get rid of ellipsis when displaying / printing a DF:

with pd.option_context("display.max_columns", 0):
    print(df)

edited Dec 17 '17 at 20:22

answered Dec 17 '17 at 19:46

MaxU - stand with Ukraine

205,989
36
386
419

1

I agree that this _should_ also work, but I see no reason why the original code didn't work. They should be equivalent. `DataFrame.to_csv` accepts either a filename or a file buffer. – Ryan Dec 17 '17 at 19:56
@RyanSandridge, that's correct - thank you! I've updated ny answer accordingly... – MaxU - stand with Ukraine Dec 17 '17 at 20:00
It still doesn't work for me for some reason, I've posted some sample code as a comment on the top. The join string workaround works however – thebear Dec 17 '17 at 20:13
@Viviane, how do you check the written CSV? – MaxU - stand with Ukraine Dec 17 '17 at 20:15
I checked it by looking at the file in excel and with this code: `test = pd.read_csv('test.csv',names=['ID','A']).set_index('ID') type(test['A'][1])` (output: str , no way to recover the data which was replaced by '...') – thebear Dec 17 '17 at 20:18
@Viviane, check it in a Text or HEX editor - for example in `Notepad++` – MaxU - stand with Ukraine Dec 17 '17 at 20:19
I can't see more in any editor that I tried... And after all, I need to get the information into python somehow – thebear Dec 17 '17 at 20:21
I don't know if I am overlooking something but I use this code: `df = pd.DataFrame(columns=['ID','A']) hot = np.zeros(9999) ID = 1 df=df.append({'ID': int(ID),'A':hot}, ignore_index=True) with open('test.csv', 'a') as f: df.to_csv(f, header=False,mode='a') test = pd.read_csv('test.csv',names=['ID','A']).set_index('ID') with pd.option_context("display.max_columns", 0): print(test['A'])` And the output is: ID 1 [ 0. 0. 0. ..., 0. 0. 0.] – thebear Dec 17 '17 at 20:47
@Viviane, try it this way: `df = pd.DataFrame({'A':hot}).assign(ID=1)` instead of your code... – MaxU - stand with Ukraine Dec 17 '17 at 21:28
It seems clear that the problem was not with the file, but how @Viviane examines the file. – Ryan Dec 18 '17 at 16:21
@RyanSandridge, the [way he is creating DF in comment](https://stackoverflow.com/questions/47857982/save-pandas-df-containing-long-list-as-csv-file/47858808?noredirect=1#comment82683283_47858808) causes problems when saving it to CSV... – MaxU - stand with Ukraine Dec 18 '17 at 16:30
Yes I agree with Max, I haven't found any way in which I can open it and see everything of the list. However when creating the df like Max proposed It saves everything. I am not sure if I can use this way of creating the Df in my case though. – thebear Dec 18 '17 at 17:20

score 1 · Accepted Answer · answered Dec 17 '17 at 18:17

1

You can probably convert elements present in the list using join method.

example:

lst =  ['Hello!','I','am', 'Pandas User','.']
strng = ' '.join(lst)
print (strng)

hope this helps to you.

answered Dec 17 '17 at 18:17

Saurabh

26
4

Thank you, that works :) It would have been nice if there is a way how pandas just saves everything in the csv file without that trick but this is a good way to work around it. – thebear Dec 17 '17 at 19:32
I'm glad this work around works for you. I'd love to understand why your original code wasn't working. Hopefully someone who knows will still post an explanation! – Ryan Dec 17 '17 at 19:53

score 0 · Answer 4 · answered Mar 01 '22 at 02:01

0

I think the pd.to_hdf() and the pd.read_hdf() functions in pandas could simply realize the saving and loading hdf5 files' requirements without installing the tables package.

answered Mar 01 '22 at 02:01

LCheng

335
3
12

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 01 '22 at 04:08

Save Pandas df containing long list as csv file

4 Answers4