2

I keep getting 403 error when I try to download this link using aiohttp: http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C

I want to download http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C.jpg but I am unable to. I even tried to add referer but I still get the same error.

Here is my code:

        async with aiohttp.ClientSession(headers={'Referer': 'https://tistory.com'}) as cs:
            async with cs.get('http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C.jpg') as r:
                if r.status == 200:
                    img = await r.read()
                    with open('C:/xxxx/xxxx/xxxx/xxxx/Image/' + 'test.jpg', 'wb') as f:
                        f.write(img)
                        print('Downloaded!)
Skol8
  • 33
  • 6

2 Answers2

1

If you request http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C.jpg, you get a 403 Forbidden, which is seen from the response headers. 403 Forbidden is a HTTP status code sent to clients by a HTTP server to indicate that the server understands the request, but will not comply with it. This makes sense here since the HTTP server may not be serving the extension you are requesting for.

However, you can just request http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C, which gives back 200 OK in the response headers, and write to a new .jpg file:

from requests import get
from requests import RequestException

from os.path import basename
from os.path import join

url = 'http://cfile2.uf.tistory.com/original/996D34465B12921B1AE97C'

jpg_file = basename(url) + '.jpg'
path = join('C:/xxxx/xxxx/xxxx/xxxx/Image/', jpg_file)

try:
    r = get(url, stream=True)
    r.raise_for_status()

    with open(jpg_file, mode='wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

except RequestException as err:
    print(err)

The above code also downloads the image in chunks, just in case the file is very big.

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
  • Yeah, I know about this but the issue is some of the links are gif and they will end up being downloaded as jpg. I have a python script which downloads tistory links posted by users on a Discord server. – Skol8 Dec 17 '18 at 16:53
  • @Skol8 You could use [this](https://stackoverflow.com/questions/1412529/how-do-i-programmatically-check-whether-a-gif-image-is-animated) to check if files are animated gifs. I would first download all images as `.gif` first, and if they are not animated, convert to `.jpg`. – RoadRunner Dec 17 '18 at 16:57
  • Thanks, I will check it out. – Skol8 Dec 17 '18 at 17:59
0

You can't request this resource because the server limits the access to it in some way. In fact you receive for response a http error code that is 403.

If you search online you can find some details:

HTTP 403 is a standard HTTP status code communicated to clients by an HTTP server to indicate that the server understood the request, but will not fulfill it for some reason related to authorization. There are a number of sub-status error codes that provide a more specific reason for responding with the 403 status code

Try to look on sub status for see what are the reasons and from there you can find some approach for make it work.

Note

Like @Dalvenjia said if you remove extension on the file the request seems to work fine.

Community
  • 1
  • 1
Iulian
  • 300
  • 2
  • 8