12

I've a lot of URL with file types .docx and .pdf I want to run a python script that downloads them from the URL and saves it in a folder. Here is what I've done for a single file I'll add them to a for loop:

response = requests.get('http://wbesite.com/Motivation-Letter.docx')
with open("my_file.docx", 'wb') as f:
    f.write(response.content)

but the my_file.docx that it is saving is only 266 bytes and is corrupt but the URL is fine.

UPDATE:

Added this code and it works but I want to save it in a new folder.

import os
import shutil
import requests

def download_file(url, folder_name):
    local_filename = url.split('/')[-1]
    path = os.path.join("/{}/{}".format(folder_name, local_filename))
    with requests.get(url, stream=True) as r:
        with open(path, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename
Ivan Vinogradov
  • 4,269
  • 6
  • 29
  • 39
Chaudhry Talha
  • 7,231
  • 11
  • 67
  • 116

2 Answers2

36

Try using stream option:

import os
import requests


def download(url: str, dest_folder: str):
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)  # create folder if it does not exist

    filename = url.split('/')[-1].replace(" ", "_")  # be careful with file names
    file_path = os.path.join(dest_folder, filename)

    r = requests.get(url, stream=True)
    if r.ok:
        print("saving to", os.path.abspath(file_path))
        with open(file_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024 * 8):
                if chunk:
                    f.write(chunk)
                    f.flush()
                    os.fsync(f.fileno())
    else:  # HTTP status code 4XX/5XX
        print("Download failed: status code {}\n{}".format(r.status_code, r.text))


download("http://website.com/Motivation-Letter.docx", dest_folder="mydir")

Note that mydir in example above is the name of folder in current working directory. If mydir does not exist script will create it in current working directory and save file in it. Your user must have permissions to create directories and files in current working directory.

You can pass an absolute file path in dest_folder, but check permissions first.

P.S.: avoid asking multiple questions in one post

Ivan Vinogradov
  • 4,269
  • 6
  • 29
  • 39
  • I'm using mac so in `file_path` when I write `r"\folder_name"` it creates a file name `\folder_name"` – Chaudhry Talha Jul 09 '19 at 11:09
  • So use os `os.path.join('whereever', 'you', 'want', 'to', 'go')` or pathlib: https://docs.python.org/3/library/pathlib.html to correctly handle paths. Or add your own absolute path in your OS pathstyle of choice. – RvdBerg Jul 09 '19 at 11:12
  • This answer just shows an example of handling file downloads with requests. Of course you should use os package to deal file file system) – Ivan Vinogradov Jul 09 '19 at 11:16
  • @IvanVinogradovIn the update section of my question, when I run it I get `No such file or directory:` – Chaudhry Talha Jul 09 '19 at 11:32
  • You need to create a new folder and save the file in it? – Ivan Vinogradov Jul 09 '19 at 11:33
  • This is what I'm doing creating a new folder `os.path.join("/{}/{}".format(folder_name, local_filename))` for which I get `No such file or directory:` and I've also tried `os.makedirs("/{}/{}".format(folder_name, local_filename))` and I get `Permission denied` – Chaudhry Talha Jul 09 '19 at 11:38
  • You get `permission denied` because your user is probably not allowed to create anything in root (`/`) directory – Ivan Vinogradov Jul 09 '19 at 11:44
  • `filename` can be received from `re.findall("filename=(.+)",response.headers['Content-Disposition'])[0]` and it's more viable as there might be 301/302 redirections and initial url not to contain file name – Pooya Estakhri Aug 26 '21 at 09:28
  • 1
    Excellent answer! – DIRTY DAVE Aug 31 '21 at 16:17
  • 1
    this works as expected – Cool guy Sep 28 '22 at 11:25
9

try:

import urllib.request 
urllib.request.urlretrieve(url, filename)
ncica
  • 7,015
  • 1
  • 15
  • 37
  • 7
    Worth noting that urlretrieve is a legacy function from Python 2 and might be deprecated at some point. It hasn't been thus far, but the documentation warns that it might. – Tammi Sep 17 '21 at 06:09
  • 5
    I wonder why such commonly used and useful method gets deprecated without a suiteable replacement. – Joerg S Mar 10 '22 at 18:54
  • @JoergS a question for someone much smarter than me, but I totally agree with you :) – ncica Mar 11 '22 at 13:57