11

I am using Microsoft sharepoint. I have an url, by using that url I need to get total data like photos,videos,folders,subfolders,files,posts etc... and I need to store those data in database(Sql server). I am using python.

So,Please anyone suggest me how to do this and I am beginner for accessing sharepoint and working this sort of things.

sai
  • 151
  • 1
  • 2
  • 10
  • 2
    Welcome to stackoverflow! Can you please explain what you have tried and what methods have you started with? For a question to attract a proper answer, you need to key in your own efforts as well. – Karthick Mohanraj Jan 30 '20 at 05:27
  • I have taken url, using microsoft graph api, I tried to get the data which is present in that url, but I can't able to get data totally. when I opened that url I can see the information which I required but I am not getting any idea , how to get data and store in to my database. – sai Jan 30 '20 at 06:14

4 Answers4

16

Here's the starter code for connecting to share point through Python and accessing the list of files, folders and individual file contents of Sharepoint as well. You can build on top of this to suit your needs.

Please note that this method works for public Sharepoint sites that are accessible through internet. For Organisation restricted Sharepoint sites that are hosted on a Company's intranet, I haven't tested this code out.

You will have to modify the link to the Sharepoint file a bit since you cannot directly access a Sharepoint file in Python using the URL address of that file which is copied from the web browser.


from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File 

####inputs########
# This will be the URL that points to your sharepoint site. 
# Make sure you change only the parts of the link that start with "Your"
url_shrpt = 'https://YourOrganisation.sharepoint.com/sites/YourSharepointSiteName'
username_shrpt = 'YourUsername'
password_shrpt = 'YourPassword'
folder_url_shrpt = '/sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/'

#######################



###Authentication###For authenticating into your sharepoint site###
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
  ctx = ClientContext(url_shrpt, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  print('Authenticated into sharepoint as: ',web.properties['Title'])

else:
  print(ctx_auth.get_last_error())
############################
  
  
  
  
####Function for extracting the file names of a folder in sharepoint###
###If you want to extract the folder names instead of file names, you have to change "sub_folders = folder.files" to "sub_folders = folder.folders" in the below function
global print_folder_contents
def print_folder_contents(ctx, folder_url):
    try:
       
        folder = ctx.web.get_folder_by_server_relative_url(folder_url)
        fold_names = []
        sub_folders = folder.files #Replace files with folders for getting list of folders
        ctx.load(sub_folders)
        ctx.execute_query()
     
        for s_folder in sub_folders:
            
            fold_names.append(s_folder.properties["Name"])

        return fold_names

    except Exception as e:
        print('Problem printing out library contents: ', e)
######################################################
  
  
# Call the function by giving your folder URL as input  
filelist_shrpt=print_folder_contents(ctx,folder_url_shrpt) 

#Print the list of files present in the folder
print(filelist_shrpt)

Now that we are able to retrieve and print the list of files present in a particular folder in Sharepoint, below is the code to access the file contents of a particular file and save it to local disk having known the file name and path in Sharepoint.

#Specify the URL of the sharepoint file. Remember to change only the the parts of the link that start with "Your"
file_url_shrpt = '/sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/YourSharepointFileName'

#Load the sharepoint file content to "response" variable
response = File.open_binary(ctx, file_url_shrpt)

#Save the file to your offline path
with open("Your_Offline_File_Path", 'wb') as output_file:  
    output_file.write(response.content)

You can refer to the following links for connecting to SQL server and storing the contents in tables: Connecting to Microsoft SQL server using Python

https://datatofish.com/how-to-connect-python-to-sql-server-using-pyodbc/

cjustin
  • 3
  • 2
Karthick Mohanraj
  • 1,565
  • 2
  • 13
  • 28
  • Thank you so much for giving the information, but in my sharepoint I have documents as one URL and few other sub sites and few more. When I accessed to that sites it would not be in the forms of folders, it would be in the form of posts/discussions. Please can you say anything related to that , how to get those data. – sai Jan 30 '20 at 09:35
  • If you have only the URL link to your sharepoint documents, you will have to extract the following parameters from the URL namely: "YourOrganisation", "YourSharepointSiteName", "YourSharepointFolderName" and "YourSharepointFileName". All the above parameters would be embedded in your sharepoint link itself. So try to parse the URL and then extract the above parameters and then try to run the above script. A simple analysis on your sharepoint link would get you all these details – Karthick Mohanraj Jan 30 '20 at 09:59
  • Help me to extract the data which is in the form of dialog/segments(likewise in the format of boxes). It is similar to quora page(https://www.quora.com/topic/Fitness). So how to get that data. I mean to say that I can't share my sharepoint data or details to you, so I just attached the link which is similar to my page. So please can you say how to get that data. – sai Jan 30 '20 at 10:15
  • Dear @sai. There is no one single solution for extracting files and posts from a sharepoint link. Both are 2 separate ways and need to be handled differently. For file extractions, the solution I gave you would work perfectly fine. But for extracting post contents, you will have to use web scraping techniques using the Beautifulsoup package of python. So the technique that you need for extracting posts and any content from a web page is web scraping and BeatufifulSoup has wonderful ways of doing web scraping, You can take a look at https://www.dataquest.io/blog/web-scraping-beautifulsoup/ – Karthick Mohanraj Jan 30 '20 at 10:30
  • Thank you so much, for suggesting and providing information. – sai Jan 30 '20 at 12:08
  • Cheers! Please don't forget to upvote and Mark an answer as accepted if you find any answer helpful on stackoverflow. – Karthick Mohanraj Jan 30 '20 at 12:11
  • While I am doing web scraping to that URL, I cant access to that URL, I am getting **403 FORBIDDEN** error. Do you have any idea how to access sharepoint URL. – sai Jan 31 '20 at 05:25
  • @sai Since I do not have an overview of your sharepoint setup, I can probably guess what might be wrong. The problem might be that you are not able to authenticate into sharepoint first. Only after authentication is successful, you will be able to scrape it's contents. You can refer this link to help you out on the authentication: https://stackoverflow.com/questions/20945822/how-to-access-a-sharepoint-site-via-the-rest-api-in-python – Karthick Mohanraj Jan 31 '20 at 06:06
  • I get the following message when running this: Authenticated into sharepoint as: My Team Problem printing out library contents: (None, None, "400 Client Error: Bad Request for url: /sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/") Any reason for that? – Michael Norman Oct 28 '21 at 02:00
  • How do read excel files from sharepoint? Anyone knows? – matt.aurelio Jun 14 '22 at 22:19
2

You might want to consider using Pysharepoint. It provides easy interface to upload and download files to and from SharePoint in Python.

import pysharepoint as ps

sharepoint_base_url = "https://<abc>.sharepoint.com/"
username = "username"
password = "password"

site = ps.SPInterface(sharepoint_base_url, username, password)

source_path = "Shared Documents/Shared/<Location>"
sink_path = "/full_sink_path/"
filename = "filename.ext"
sharepoint_site = "https://<abc>.sharepoint.com/sites/<site_name>"

site.download_file_sharepoint(source_path, sink_path, filename, sharepoint_site)
site.upload_file_sharepoint(source_path, sink_path, filename, sharepoint_site)
tuomastik
  • 4,559
  • 5
  • 36
  • 48
1

A simpler solution would be to create a shortcut in your OneDrive. This shortcut is then readable with a common pd.read_excel, pd.read_csv, etc.

For example:

df = pd.read_excel(r'C:\Users\badgenumber\OneDrive - company\Team folder\Ticketing System\ Inquiries\Inquiry tracker.xlsx')
doneforaiur
  • 1,308
  • 7
  • 14
  • 21
0

Did you check the Office365-REST-Python-Client?

https://github.com/vgrem/Office365-REST-Python-Client

For examples see following link:

https://github.com/vgrem/Office365-REST-Python-Client/tree/master/examples/sharepoint/files

grietbroek
  • 73
  • 7