11

quick one.

I have XLSX file located on sharepoint drive and cannot open it using openpyxl in python, it works well if it is stored on my local drive.

I tried this.

from openpyxl import load_workbook
wb = load_workbook('https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx')

Throws up this exception:

C:\Anaconda\lib\site-packages\openpyxl\reader\excel.py in load_workbook(filename, use_iterators, keep_vba, guess_types, data_only)
    123     except (BadZipfile, RuntimeError, IOError, ValueError):
    124         e = exc_info()[1]
--> 125         raise InvalidFileException(unicode(e))
    126     wb = Workbook(guess_types=guess_types, data_only=data_only)
    127 

InvalidFileException: [Errno 22] invalid mode ('rb') or filename: 'https://...

Am I missing something? I need to read the content of one of the sheets in python.


EDIT:

Using crussell's advice, I receive 401 UNAUTHORIZED:

import requests
import urllib
from openpyxl import load_workbook
from requests.auth import HTTPBasicAuth

file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"

username = 'PotatoUser'
password = 'PotatoPassword'

resp=requests.get(file, auth=HTTPBasicAuth(username, password))
print(resp.content)

Seems like sharepoint and requests are not compatible, with both Digest Authentication and Basic Authentication http://docs.python-requests.org/en/latest/user/authentication/

Gunay Anach
  • 1,193
  • 1
  • 13
  • 19

3 Answers3

4

Instead of trying to load directly from a web-address, try using urllib.

import urllib
file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"
urllib.urlretrieve(file,"test.xlsx")

From further research, requests may be preferred over urllib. Try this:

import requests
from requests.auth import HTTPBasicAuth
file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"
        
username = 'myUsername'
password = 'myPassword'
        
resp=requests.get(file, auth=HTTPBasicAuth(username, password))
output = open('test.xlsx', 'wb')
output.write(resp.content)
output.close()

To get requests installed:

pip install requests
crussell
  • 179
  • 4
  • One forward slash throws this exception: URLError: – Gunay Anach Dec 10 '15 at 14:20
  • Thanks crussell, this seems to be right direction for me, now need to fight with authentication request while accessing the file. IOError: ('http error', 401, 'Unauthorized' ... – Gunay Anach Dec 10 '15 at 15:29
  • Is the file encrpyted? – crussell Dec 10 '15 at 15:40
  • The web server seems to be expecting some authentication credentials, do you have a username and password for the website? – crussell Dec 10 '15 at 15:46
  • Yes i do have credentials and the file is not encrypted – Gunay Anach Dec 10 '15 at 15:58
  • 1
    Thank you for your suggestions crussell, It seems like there is authentication issues which cannot be handled by **requests**. Will keep looking for another method. – Gunay Anach Dec 11 '15 at 12:11
  • 1
    Instead of basic auth since this is internal sharepoint, I used this instead - from requests_negotiate_sspi import HttpNegotiateAuth resp=requests.get(file, auth=HttpNegotiateAuth()) – ThinkCode Jun 28 '20 at 23:01
0

You probably first need to download it first rather than opening it directly. The following approach should work:

import urllib2
from openpyxl import load_workbook
import StringIO

data = urllib2.urlopen("https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx")
xlsx = data.read()
wb = load_workbook(StringIO.StringIO(xlsx))

Python's StringIO could be used to make the downloaded data appear as a file interface.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
0

If the SP is internal, it might work by removing "https:" in the name you put in the load_workbook().

from openpyxl import load_workbook
file = '//content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx'
wb = load_workbook(file)

without authentication if your account at work is directly connected to SP. Otherwise, at my work, we use NTML authentification, that you can do by using HttpNtlmAuth from the library request_ntml.

Let me know if it works or if you are still interested by this issue, I could give you example with request_ntml

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • I would be interested in seeing an example with request_ntml, please. – Calab Nov 15 '19 at 16:22
  • 1
    I have no issues opening the spreadsheet from SharePoint in a web browser, but trying to do so from a python script is giving me a 403 forbidden error. What am I missing? `import requests_ntlm import requests file = "https://ourcompany.sharepoint.com/abcd/5UJJPA3D/FILE" u=r"domain\username" p="password" resp = requests.get(file, auth=requests_ntlm.HttpNtlmAuth(u,p)) print(resp.content) b'403 FORBIDDEN'` – Calab Nov 15 '19 at 19:43
  • @Calab Hi, unfortunately, I'm not at the same work anymore and I had to leave my scripts so I can't answer your question. sorry – Ben.T Nov 18 '19 at 20:29