Downloading PDFs from URL with Python Requests

Question

I am trying to write some script that will download a pdf from a URL to my pc. Looking around the internet, I have found a few examples of what I am trying to accomplish. I'm very very new to Python and keep getting a syntax error in my code.

import requests

url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
response = requests.get(url)

with open('C:\Users\User\PycharmProjects\PDFTest\FolderTest\dummy.pdf', 'wb') as f:
    f.write(response.content)

The error I receive is:

  File "C:\Users\User\PycharmProjects\PDFTest\main.py", line 7
    with open('C:\Users\User\PycharmProjects\PDFTest\FolderTest\dummy.pdf', 'wb') as f:
                                                                          ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Process finished with exit code 1

I'm running everything within a VM if that is of any consequence and on Python 3.9.1. I see that the syntax error is at the comma, but everything I see says the comma is a must. Any help would be greatly appreciated. My programming experience is limited to a few semesters of C++ in college and loads of Python tutorials and videos.

My end goal for this project (once I get this part working) is to cycle through a domain to download PDF's. They are all stored as Https://www.examplesite.com/1000.pdf; ...com/1001.pdf; ...com/1002.pdf; etc. I think I can accomplish this by running the above in a for loop and increasing the pdf URL (a number) with ++. Thanks for the help!

Does this answer your question? [What exactly do "u" and "r" string flags do, and what are raw string literals?](https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-and-what-are-raw-string-literals) — manveti, Feb 18 '21 at 19:23

score 0 · Answer 1 · answered Feb 18 '21 at 19:20

0

This is because you are using \ in a string unescaped. Try to use \\ or put and r in front of the string.

You can also use pathlib, I find this easier:

from pathlib import Path
import requests
filename = Path(r'C:\Users\User\PycharmProjects\PDFTest\FolderTest\dummy.pdf')
url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
response = requests.get(url)
filename.write_bytes(response.content)

answered Feb 18 '21 at 19:20

KetZoomer

2,701
3
15
43

Thank you very much! This seemed to do the trick. – JSimmons Feb 19 '21 at 02:01

score 0 · Answer 2 · answered Feb 18 '21 at 19:25

Your problem is occurring because the interpreter is struggling to parse your file's path because it contains unicode escape characters.

try

file_path = r'drive:\path\to\file'

This essentially escapes special characters in the string by telling the interpreter to read it as a raw string.

For an alternative implementation

Tqdm offers progressbars for the terminal.

import os
from tqdm import tqdm
import requests

def download(lnk:str, fname:str):
    rq = requests.get(lnk,stream=True)
    totalsize = int(rq.headers['content-length'])
    chunksize = 1024
    if totalsize:
        print(f'\t{round(totalsize*10**-3,2):,} kb')
    with open(fname,'wb') as fobj:
        if totalsize:
            for b in tqdm(iterable=rq.iter_content(chunk_size=chunksize), total = totalsize/chunksize, unit = 'KB'):
                fobj.write(b)
        else:
            for b in tqdm(rq):
                fobj.write(b)
    os.startfile(fname)

Downloading PDFs from URL with Python Requests

2 Answers2