1

I have a JSON object consumed from a RabbitMQ queue which contains a base64 encoded xls file. The structure is the following:

{
    "error": false,
    "excel_file": "0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQ...........",
    "other_field": "more data here",
    ...
    ...
}

Currently i do something like:

data = json.loads(msg)  # msg is the body of the message consumed from rabbitmq queue
file_bytes = bytes(data['excel_file'], 'utf8')  # returns: b'0M8R4KGxGuEAAAAAAAAAAAAAAAAAA...
decoded_bytes = base64.b64decode(file_bytes)  # returns: b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00...

# save data to file
with open('file.xls', 'wb') as f:
    f.write(decoded_bytes)

# open the saved file as xls for manipulation
with xlrd.open_workbook('file.xls') as f:
    sh0 = f.sheet_by_index(0)
    print(sh0.name)

"""Manipulate the xls file data with ease.""""

I don't want to create an xls file in my filesystem and delete it after manipulation.

So ideally, i would like to use something like xlrd.loads() (doesn't exists) to load directly the decoded data as xlrd object.

Just like json.loads() can do.

Anyone knows how to load xls data directly from bytearray for manipulation?

UPDATE

As @Raphael and @Masklinn suggested i utilized xlrd's file_contents to load xls data directly from bytes, without any file.

So the code now is the following:

data = json.loads(msg)  # msg is the body of the message consumed from rabbitmq queue
decoded_bytes = base64.b64decode(data['excel_file'])  # returns: b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00...

with xlrd.open_workbook(file_contents=decoded_bytes) as f:
    sh0 = f.sheet_by_index(0)
    # manipulate the data as xlrd Book object
    # ....
Gr3at
  • 330
  • 6
  • 12

1 Answers1

3

xlrd has also a file_contents parameter. See the docs or this other question

In general you can also use BytesIO when you have bytes and a file handle is required. io.BytesIO(bytes) behaves exactly as open('file.bytes', 'rb)

Raphael
  • 1,731
  • 2
  • 7
  • 23
  • 1
    Also FWIW `b64decode` can take a string input, no need to encode it explicitly. You may want to call it with `validate=True` though, otherwise it just strips anything which isn't a b64 character then tries to decode the rest, which can lead to odd results. – Masklinn Dec 18 '20 at 11:40
  • Great. `file_contents` did the trick. Found also another [answer](https://stackoverflow.com/a/52077212/10269515) utilizing pandas but i prefer the xlrd solution in my case. – Gr3at Dec 18 '20 at 12:33