Read Excel file with pandas from url reponse

Question

I am looking how to read an xlsx file with pandas, with the file hosted on SharePoint. These contents, when shown through reponse.text, are in string but are a binary representation of the file.

PK╚╝ ! #h�╔�╔ �═ �╔[Content_Types].xml ��╔(� ╗ ��[O�@��M�;��1��G% {�΀��.Z�E��Ҧݝ�I{��5�╗"j� ��"╚W�J�I!^_Z�"CR�R�;(� P ��g��U ̸�a!�D�FJ,�`�>�㕱�V?Ɖ �� n�}%K��Pv��k'#�Dv��W� �B0�T�F��U? -?�*_�-K�"� � dM�fb|═"�BndF0x�3UΕ�Nu� ��P�lO�Y�ğ#� ��,g�K#�}��E=�tD�U�}��O�Q�[��F�|Ix��╚��[H2{�H+╚x�k�]dn�a�╔yZ"N�jͺ�"ih�s�Gn�<j�╚

I would like to know how to read this format into memory so that I can call pd.read_excel with it.

I've tried to use urllib and openpyxl, in this manner:

    import openpyxl as excel
    import pandas as pd
    from io import BytesIO
    import urllib
    req = urllib.request.Request(url=url, data=payload, headers=headers)
    with urllib.request.urlopen(url=req) as reponse:
        rsp = reponse.read()
    excel.load_workbook(filename=rsp)

But I am getting error 400 Bad Request, from the urllib request module.

The url looks something like this:

https://company.sharepoint.com/sites/test-department/_api/Web/GetFileByServerRelativeUrl('/sites/test-department/DepartmentDocuments/test/Book1.xlsx')/$value?binaryStringResponseBody=true

The `filename` parameter would imply an actual file on disk, not the content of the file. Have you tried writing the response to a file? — OneCricketeer, Feb 02 '21 at 21:27
I could write it to memory; the question is how do I get the file in the first place from the url — dirtyw0lf, Feb 03 '21 at 00:17
Does this help? https://stackoverflow.com/questions/44699682/how-to-save-a-file-downloaded-from-requests-to-another-directory/44699728 — OneCricketeer, Feb 03 '21 at 13:33

score 1 · Answer 1 · answered Feb 03 '21 at 16:22

I found a way to do it. the key was to seek back before passing the file to pandas.

file_ext = self.file_name.split('.')[-1]
if file_ext == 'xlsx':
    import pandas as pd
    from io import BytesIO
    xl = bytes(memoryview(response.content))
    memfile =BytesIO()
    memfile.write(xl)
    memfile.seek(0)
    df = pd.io.excel.read_excel(memfile, engine='openpyxl')
    print(df.head(10))

Read Excel file with pandas from url reponse

1 Answers1