I'm trying to download multiple word documents off of a website into a folder that I can iterate through. They are hosted in a sharepoint list, and I've already been able to parse the HTML code to compile a list of all the links to these word documents. These links (when clicked) prompt you to open or save a word document. In the end of these links, the title of the word doc is there too. I've been able to split the URL strings to get a list of the names of the word documents that line up with my list of URLs. My goal is to write a loop that will go through all the URLs and download all the word documents into a folder. EDIT- taking into consideration @DeepSpace and @aneroid 's suggestions (and trying my best to implement them)... My code-
import requests
from requests_ntlm import HttpNtlmAuth
import shutil
def download_word_docs(doc_url, doc_name):
r = requests.get(doc_url, auth=HttpNtlmAuth(domain\\user, pass), stream=True)
with open(doc_name, 'wb') as f:
shutil.copyfileobj(r.raw, f) #where's it copying the fileobj to?
I think this is different from an image because my request is to a download link rather than a physical jpeg image... I may be wrong but this is a tricky situation.
I'm still trying to get my program to download (or create a copy of) a .docx into a folder with a specified path (that I can set). Currently it runs in the admin command prompt (I'm on Windows) without error but I don't know where it's copying the file to. My hope is that if I can get one to work I can figure out how to loop it over my list of URLs. Thanks guys (@DeepSpace and @aneroid) for your help thus far.