8

I am trying to save a pandas dataframe as .csv file. Currently my code looks like this:

with open('File.csv', 'a') as f:
        df.to_csv(f, header=False)

The saving works but the problem is that the lists in my dataframe are just compressed to [first,second,...,last] and all the entries in the middle are discarded. If I just look at the original dataframe all entries are there. Is there any way how I can convert the list to a string which contains all the elements (str(df) also discards the middle elements) or how I can save a full numpy array in a cell of a csv table?

Thank you for your help, Viviane

PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
thebear
  • 133
  • 1
  • 7
  • 2
    I don't see any reason you the code you have doesn't work. I did notice that you are appending to the file rather than creating a new file. Are you sure the file didn't already have the data as you describe and if you look at the end of that file it has the entire dataframe? – Ryan Dec 17 '17 at 18:12
  • Yes, for my purpose I need to append and no, it is not already at the end, sorry – thebear Dec 17 '17 at 19:33
  • No, `df.to_csv('File.csv', header=False, mode='a')` should write __all__ values into the file. Could you produce a __reproducible__ code? – MaxU - stand with Ukraine Dec 17 '17 at 19:44
  • `df = pd.DataFrame(columns=['ID','A']) hot = np.zeros(9999) ID = 1 df=df.append({'ID': int(ID),'A':hot}, ignore_index=True) with open('test.csv', 'a') as f: df.to_csv(f, header=False)` – thebear Dec 17 '17 at 20:08

4 Answers4

5

I had issues while saving dataframes too. I had a dataframe in which some columns consisted of lists as its elements. When I saved the datfarme using df.to_csv and then read it from disk using df.read_csv, the list and arrays were turned into a string of characters. Hence [1,2,3] was transformed to '[1,2,3]'. When I used HDF5 format the problem was solved.

If you dataframe is called df_temp, then you can use:

store = pd.HDFStore('store.h5')
store['df'] = df_temp

to save the dataframe in HDF5 format and you can read it using the following command:

store = pd.HDFStore('store.h5')
df_temp_read = store['df']

You can look at this answer. I should also mention that pickle did not work for me, since I lost the column names when reading from the file. Maybe I did something wrong, but apart from that, pickle can cause compatibility issues if you plan to read the file in different python versions.

Vahid S. Bokharaie
  • 937
  • 1
  • 9
  • 25
  • This should probably be the selected answer. Works with dataframes containing multiple lists of multiple data types – ClonedOne Oct 05 '20 at 22:27
3

Your code should work properly. I couldn't reproduce described behavior.

Here is a bit more "pandaic" version:

df.to_csv('File.csv', header=False, mode='a')

PS pay attention at the mode='a' (append) parameter...

UPDATE:

How to get rid of ellipsis when displaying / printing a DF:

with pd.option_context("display.max_columns", 0):
    print(df)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 1
    I agree that this _should_ also work, but I see no reason why the original code didn't work. They should be equivalent. `DataFrame.to_csv` accepts either a filename or a file buffer. – Ryan Dec 17 '17 at 19:56
  • @RyanSandridge, that's correct - thank you! I've updated ny answer accordingly... – MaxU - stand with Ukraine Dec 17 '17 at 20:00
  • It still doesn't work for me for some reason, I've posted some sample code as a comment on the top. The join string workaround works however – thebear Dec 17 '17 at 20:13
  • @Viviane, how do you check the written CSV? – MaxU - stand with Ukraine Dec 17 '17 at 20:15
  • I checked it by looking at the file in excel and with this code: `test = pd.read_csv('test.csv',names=['ID','A']).set_index('ID') type(test['A'][1])` (output: str , no way to recover the data which was replaced by '...') – thebear Dec 17 '17 at 20:18
  • @Viviane, check it in a Text or HEX editor - for example in `Notepad++` – MaxU - stand with Ukraine Dec 17 '17 at 20:19
  • I can't see more in any editor that I tried... And after all, I need to get the information into python somehow – thebear Dec 17 '17 at 20:21
  • I don't know if I am overlooking something but I use this code: `df = pd.DataFrame(columns=['ID','A']) hot = np.zeros(9999) ID = 1 df=df.append({'ID': int(ID),'A':hot}, ignore_index=True) with open('test.csv', 'a') as f: df.to_csv(f, header=False,mode='a') test = pd.read_csv('test.csv',names=['ID','A']).set_index('ID') with pd.option_context("display.max_columns", 0): print(test['A'])` And the output is: ID 1 [ 0. 0. 0. ..., 0. 0. 0.] – thebear Dec 17 '17 at 20:47
  • @Viviane, try it this way: `df = pd.DataFrame({'A':hot}).assign(ID=1)` instead of your code... – MaxU - stand with Ukraine Dec 17 '17 at 21:28
  • It seems clear that the problem was not with the file, but how @Viviane examines the file. – Ryan Dec 18 '17 at 16:21
  • @RyanSandridge, the [way he is creating DF in comment](https://stackoverflow.com/questions/47857982/save-pandas-df-containing-long-list-as-csv-file/47858808?noredirect=1#comment82683283_47858808) causes problems when saving it to CSV... – MaxU - stand with Ukraine Dec 18 '17 at 16:30
  • Yes I agree with Max, I haven't found any way in which I can open it and see everything of the list. However when creating the df like Max proposed It saves everything. I am not sure if I can use this way of creating the Df in my case though. – thebear Dec 18 '17 at 17:20
1

You can probably convert elements present in the list using join method.

example:

lst =  ['Hello!','I','am', 'Pandas User','.']
strng = ' '.join(lst)
print (strng)

hope this helps to you.

Saurabh
  • 26
  • 4
  • Thank you, that works :) It would have been nice if there is a way how pandas just saves everything in the csv file without that trick but this is a good way to work around it. – thebear Dec 17 '17 at 19:32
  • I'm glad this work around works for you. I'd love to understand why your original code wasn't working. Hopefully someone who knows will still post an explanation! – Ryan Dec 17 '17 at 19:53
0

I think the pd.to_hdf() and the pd.read_hdf() functions in pandas could simply realize the saving and loading hdf5 files' requirements without installing the tables package.

LCheng
  • 335
  • 3
  • 12
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 01 '22 at 04:08