Python requests.get returns Gibberish

Question

I'm trying to scrape the following URL:

link='https://www.opensubtitles.org/en/subtitleserve/sub/6646133'

When I do

html = requests.get(link)

it returns in

html.content

Gibberish (starting at b'PK\x03\x04\x14\x00\x00\x00\x08\x00z\x8c8Q\xd5H\xc5\xd7\xaf7\x00\x00\xdf\x95\x00\x00^\x00\x00\x00...)

Why I'm not getting clear text?

`PK` is a zip file or something like a zip. You've grabbed an archive, and that's its binary data. — Carcigenicate, Sep 24 '20 at 15:39
Using `curl`, I get no response from that link, and it redirects in the browser — OneCricketeer, Sep 24 '20 at 15:42
@buran you have the link and the get command, what is more reproducible than that? — Binyamin Even, Sep 24 '20 at 15:44
I get nice html and it looks ike you are accessing something different — buran, Sep 24 '20 at 15:45
I also can't reproduce this. There are unzipping libraries though that likely take streams of binary data. — Carcigenicate, Sep 24 '20 at 15:47

Bertrand Martel · Answer 1 · 2020-09-24T20:06:21.953

You can use zipfile to unzip it and then check the filenames. If you are interested in extracting the srt files, the following will get the content :

import requests, zipfile, io

r = requests.get("https://www.opensubtitles.org/en/subtitleserve/sub/6646133",
    headers = {
        "referer": "https://www.opensubtitles.org/en/subtitles/6646133/america-s-got-talent-audition-1-en"
})
z = zipfile.ZipFile(io.BytesIO(r.content))
filenames = z.namelist()
print(filenames)
srt_files = [t for t in filenames if t.endswith(".srt")]
for t in srt_files:
    content = z.read(t)
    print(content)

run it on repl.it

Python requests.get returns Gibberish

1 Answers1