I'm using the Microsoft Graph API to retrieve Sharepoint document content from within a Python script. I search for documents with the https://graph.microsoft.com/v1.0/search/query
endpoint, and then attempt to retrieve the document content via https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content
. I want to write content as a .pdf to a blob storage for further processing.
Now, when I call the content endpoint with the Python requests
library, I get the .pdf back as a string from the endpoint, which I can retrieve with response.text
. This text looks as you would expect for .pdf content (snippet):
%PDF-1.7
%����
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(nl-NL) /StructTreeRoot 29 0 R/MarkInfo<</Marked true>>/Metadata 117 0 R/ViewerPreferences 118 0 R>>
endobj
2 0 obj
<</Type/Pages/Count 2/Kids[ 3 0 R 24 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 10 0 R/F3 12 0 R/F4 17 0 R/F5 19 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/XObject<</Image9 9 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 594.96 842.04] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 3438>>
stream
x��\mS�8�N�A��EX�$�s[T
so what I try to do is write this content to a file like:
with open('pdffilefromsharepoint.pdf', 'w') as f:
f.write(response.text)
Now this writes away to the PDF without error. However, when I open the document in a .pdf-reader I get just two empty pages with no content at all. Moreover, when I look at the raw contents of my original Sharepoint file and my .pdf file that was written via the result of the content gathered from the Graph API, they seem to be exactly identical: Same number of lines, and also seem to have the exact same content in it line-by-line.
One notable thing is that the original document is just 68kb, while the one written from the gathered API content is 113kb.
Has anyone ever tried to achieve a similar thing like this? Do I need a special package to write this content to a .pdf again from Python?