22

Hi I am trying to convert my df to binary and store it in a variable.

my_df:

 df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})

my code:

 import io
 towrite = io.BytesIO()
 df.to_excel(towrite)  # write to BytesIO buffer
 towrite.seek(0)  # reset pointer
 

I am getting AttributeError: '_io.BytesIO' object has no attribute 'write_cells'

Full Traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-25-be6ee9d9ede6> in <module>()
      1 towrite = io.BytesIO()
----> 2 df.to_excel(towrite)  # write to BytesIO buffer
      3 towrite.seek(0)  # reset pointer
      4 encoded = base64.b64encode(towrite.read())  #

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
   1422         formatter.write(excel_writer, sheet_name=sheet_name, startrow=startrow,
   1423                         startcol=startcol, freeze_panes=freeze_panes,
-> 1424                         engine=engine)
   1425 
   1426     def to_stata(self, fname, convert_dates=None, write_index=True,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\excel.py in write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine)
    624 
    625         formatted_cells = self.get_formatted_cells()
--> 626         writer.write_cells(formatted_cells, sheet_name,
    627                            startrow=startrow, startcol=startcol,
    628                            freeze_panes=freeze_panes)

AttributeError: '_io.BytesIO' object has no attribute 'write_cells'
Asclepius
  • 57,944
  • 17
  • 167
  • 143
Pyd
  • 6,017
  • 18
  • 52
  • 109
  • I just test your code and working nice in pandas 0.23.4 under windows. – jezrael Aug 30 '18 at 06:28
  • jezrael, when i print(towrite) it just prints `b''` why not `b'/x0/....'` I need to return the binary data, how can i get it – Pyd Aug 30 '18 at 07:12
  • @pyd - I test it by `df = pd.read_excel(towrite)` and it return back DataFrame. Maybe some python issue? I test it under python 3.6 – jezrael Aug 30 '18 at 07:14
  • 2
    I resolved it by `towrite.getvalue()`, Thanks for your help @jezrael – Pyd Aug 30 '18 at 07:32

3 Answers3

24

I solved the issue by upgrading pandas to newer version.

 import io
 towrite = io.BytesIO()
 df.to_excel(towrite)  # write to BytesIO buffer
 towrite.seek(0) 
 print(towrite)
 b''
 print(type(towrite))
 _io.BytesIO

if you want to see the bytes-like object use getvalue,

print(towrite.getvalue())
b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x00\x00!\x00<\xb
Pyd
  • 6,017
  • 18
  • 52
  • 109
  • 1
    Be very careful using the Excel format because it limits the [max number of rows and columns](https://superuser.com/a/366473/). Various other formats don't have such a limit. – Asclepius Dec 09 '20 at 04:38
  • use parquet instead – Kermit Jun 11 '22 at 19:18
11

Pickle

Pickle is a reproducible format for a Pandas dataframe, but it's only for internal use among trusted users. It's not for sharing with untrusted users due to security reasons.

import pickle

# Export:
my_bytes = pickle.dumps(df, protocol=4)

# Import:
df_restored = pickle.loads(my_bytes)

This was tested with Pandas 1.1.2. Unfortunately this failed for a very large dataframe, but then what worked is pickling and parallel-compressing each column individually, followed by pickling this list. Alternatively you can pickle chunks of the large dataframe.

CSV

If you must use a CSV representation:

df.to_csv(index=False).encode()

Note that various datatypes are lost when using CSV.

Parquet

See this answer. Note that various datatypes are converted when using parquet.

Excel

Avoid its use for the most part because it limits the max number of rows and columns.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
  • 1
    Thank you. I've been allowing users to save a dict of dtypes, but after reading this article I will switch to persisting in parquet https://towardsdatascience.com/stop-persisting-pandas-data-frames-in-csvs-f369a6440af5 – Kermit Dec 09 '20 at 13:15
  • Hi @Asclepius, what data is lost when exporting to csv? I tried googling it but couldn't find much. – piedpiper May 05 '22 at 22:55
  • @piedpiper Examples could be: floats vs ints, datetime formats. – Asclepius May 05 '22 at 23:03
4

I required to upload the file object to S3 via boto3 which didn't accept the pandas bytes object. So building on the answer from Asclepius I cast the object to a BytesIO, eg:

from io import BytesIO  
data = BytesIO(df.to_csv(index=False).encode('utf-8'))
Iain Hunter
  • 4,319
  • 1
  • 27
  • 13