4

My java programmer converted an excel file to binary and sending the binary content to me.

He used sun.misc.BASE64Encoder and sun.misc.BASE64Decoder() for encoding.

I need to convert that binary data to a data frame using python.

the data looks like,

UEsDBBQABgAIAAAAIQBi7p1oXgEAAJAEAAATAAgCW0NvbnRlbnRfVHl........

I tried bas64 decoder but not helped.

my code:

import base64
with open('encoded_data.txt','rb') as d:
    data=d.read()
print(data)
`UEsDBBQABgAIAAAAIQBi7p1oXgEAAJAEAAATAAgCW0NvbnRlbnRfVHl........`
decrypted=base64.b64decode(data)
print(decrypt)
  'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

Please help me to convert this binary data to a pandas dataframe.

Pyd
  • 6,017
  • 18
  • 52
  • 109

2 Answers2

13

You're almost there. Since the decrypted object is a bytes string, why not use BytesIO?

import io
import pandas as pd

toread = io.BytesIO()
toread.write(decrypted)  # pass your `decrypted` string as the argument here
toread.seek(0)  # reset the pointer

df = pd.read_excel(toread)  # now read to dataframe

Answering your question from your comment: How to convert a df to a binary encoded object?

Well, if you want to convert it back to an b64 encoded object with pandas converting it as an excel, then:

towrite = io.BytesIO()
df.to_excel(towrite)  # write to BytesIO buffer
towrite.seek(0)  # reset pointer
encoded = base64.b64encode(towrite.read())  # encoded object

To write the encoded object to a file (just to close the loop :P):

with open("file.txt", "wb") as f:
    f.write(encoded)
Scratch'N'Purr
  • 9,959
  • 2
  • 35
  • 51
2

You can do with openpyxl module also Here is the modified code

import base64
import io
import openpyxl

with open('encoded_data.txt','rb') as d:
    data=d.read()
print(data)
decrypted=base64.b64decode(data)
print(decrypted)

xls_filelike = io.BytesIO(decrypted)
workbook = openpyxl.load_workbook(xls_filelike)
sheet_obj = workbook.active
max_col = sheet_obj.max_column 
max_row = sheet_obj.max_row

# Will print all the row values
for i in range(1, max_row +1):
    for j in range(1, max_col + 1):         
        cell_obj = sheet_obj.cell(row = i, column = j) 
        print cell_obj.value, 
        print ",", "Inorder to seperate the cells using comma for readability
    print ""