Why is Python script to download .xlsx from Sharepoint failing only for some URLs?

Question

Using the Python Office365-REST-Python-Client I have written the following Python function to download Excel spreadsheets from Sharepoint (based on the answer at How to read SharePoint Online (Office365) Excel files in Python with Work or School Account? )

import sys
from urlparse import urlparse
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.file import File

xmlErrText = "<?xml version=\"1.0\" encoding=\"utf-8\"?><m:error"

def download(sourceURL, destPath, username, password):
    print "Download URL:  {}".format(sourceURL)
    urlParts = urlparse(sourceURL)
    baseURL = urlParts.scheme + "://" + urlParts.netloc
    relativeURL = urlParts.path
    if len(urlParts.query):
        relativeURL = relativeURL + "?" + urlParts.query

    ctx_auth = AuthenticationContext(baseURL)
    if ctx_auth.acquire_token_for_user(username, password):
        try:
            ctx = ClientContext(baseURL, ctx_auth)
            web = ctx.web
            ctx.load(web)
            ctx.execute_query()
        except:
            print "Failed to execute Sharepoint query (possibly bad username/password?)"
            return False
        print "Logged into Sharepoint: {0}".format(web.properties['Title'])
        response = File.open_binary(ctx, relativeURL)
        if response.content.startswith(xmlErrText):
            print "ERROR response document received.  Possibly permissions or wrong URL?  Document content follows:\n\n{}\n".format(response.content)
            return False
        else:
            with open(destPath, 'wb') as f:
                f.write(response.content)
                print "Downloaded to:  {}".format(destPath)
    else:
        print ctx_auth.get_last_error()
        return False
    return True

This function works fine for some URLs but fails for others, printing the following "file does not exist" document content on failure (newlines and whitespace added for readability):

<?xml version="1.0" encoding="utf-8"?>
<m:error xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
    <m:code>
        -2130575338, Microsoft.SharePoint.SPException
    </m:code>
    <m:message xml:lang="en-US">
        The file /sites/path/to/document.xlsx does not exist.
    </m:message>
</m:error>

I know that the username and password are correct. Indeed changing the password results in a completely different error.

I have found that this error can occur when either the document doesn't exist, or when there are insufficient permissions to access the document.

However, using the same username/password, I can download the document with the same URL in a web browser.

Note that this same function consistently works fine for some .xlsx URLs in the same Sharepoint repository, but consistently fails for some other .xlsx URLs in that same Sharepoint repository.

My only guess is that there are some more fine-grained permissions that need to me managed. But I'm completely ignorant to these if they exist.

Can anybody help me to resolve why the failure is occurring and figure out how to get it working for all the required files that I can already download in a web browser?

Additional Notes From Comments Below

The failures are consistent for some some URLs. The successes are consistent for other URLs. Ie, for one URL, the result is always the same - it does not come and go.
The files have not moved or been deleted. I can download them using browsers/PCs which have never accessed those files previously.
The source of the URLs is Sharepoint itself. Doing a search in Sharepoint includes those files in the results list with a URL below each file. This is the URL that I'm using for each file. (For some files the script works and for others it does not; for all files the browser works for the same URL.)
The URLs are all correctly encoded. In particular, spaces are encoded with %20.

I would think that the difference is in the URLs themselves. Do the URLs that are failing have spaces in them by any chance? — dgoverde, Apr 25 '19 at 23:40
Do the same files consistently fail or does it change which files succeed and which ones fail? — MyNameIsCaleb, Apr 26 '19 at 19:57
Did you Google the error code included in the error message (-2130575338, or 0x81020016 in hexadecimal)? It seems to correspond with a `ClientErrorCodes.ListItemDeleted` error (https://learn.microsoft.com/en-us/previous-versions/office/sharepoint-server/ee545265%28v%3doffice.15%29). Maybe the item was deleted and is still cached in your browser? There seem to be other causes for this error too, but I'm not so familiar with SharePoint so I cannot filter relevant web pages for this case. I just hopes this puts you in the right direction :-) — wovano, Apr 26 '19 at 21:02
I suspect @wovano is on the right track. Perhapse try retrieving the file's properties or versioning info and see if that works? Perhapse the files were deleted and re-added or returned from the recycle bin, but the deletion flag hasn't been reset or is still in a cache? — Simon Hibbs, Apr 29 '19 at 11:06
Looking at the file's history, it appears that it has never been deleted. But it has been moved. I can't download it from its old location, even in the browser. But I can download it from the new location in the browser, but not with the script. I'm no Sharepoint expert, and I'm unsure what else to look for. — Son of a Beach, Apr 30 '19 at 05:02

Why is Python script to download .xlsx from Sharepoint failing only for some URLs?

0 Answers0