Retrieving an image gives 403 error while it works with browser

Question

Hi i'm trying to build a manga downloader app, for this reason I'm scraping several sites, however I have a problem once I get the image URL. I can see the image using my browser (chrome), I can also download it, however I can't do the same using any popular scripting library.

Here is what I've tried:

String imgSrc = "https://cdn.mangaeden.com/mangasimg/aa/aa75d306397d1d11d07d66746dae78a36dc78672ae9e97a08cb7abb4.jpg"
Connection.Response resultImageResponse = Jsoup.connect(imgSrc)
                    .userAgent(
                            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
                    .referrer("none").execute();

// output here
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(new java.io.File(String.valueOf(imgPath))));
out.write(resultImageResponse.body());          // resultImageResponse.body() is where the image's contents are.
out.close();

I've also tried this:

URL imgUrl = new URL(imgSrc);
Files.copy(imgUrl.openStream(), imgPath);

Lastly, since I was sure the link works I've tried to download the image using python, but also in this case I get a 403 error

import requests
base_url = "https://cdn.mangaeden.com/mangasimg/d0/d08f07d762acda8a1f004677ab2414b9766a616e20bd92de4e2e44f1.jpg"
res = requests.get(url)

googling I found this Unable to get image url in Mangaeden API Angular 6 which seems really close to my problem, however I don't understand if I'm setting wrong the referrer or it doesn't work at all...

Do you have any tips? Thank you!

`curl.exe "https://cdn.mangaeden.com/mangasimg/d0/d08f07d762acda8a1f004677ab2414b9766a616e20bd92de4e2e44f1.jpg"` gives error code: 1020 (access denied by cloudflare), so probably some caching or cookie token protection in place — MortenB, Dec 27 '21 at 20:30
Pasting the URL directly into the browser gives a 403 as well (both using Chrome and using Postman). — BrokenBenchmark, Dec 27 '21 at 20:32
Well I think is normal Postman/curl doesn't work, they are exactly the same as request library when the configuration is the same. My question is: why the browser can display the image? Does it have some different configuration? @BrokenBenchmark — Stefano, Dec 27 '21 at 20:35
Sorry, I should have clarified that I used both Chrome and Postman. — BrokenBenchmark, Dec 27 '21 at 20:36
Oh... That was unexpected, so why am I seeing this image? I've tried to open the link with different browsers and also devices and it works perfectly. I.e. i sent the same link to my phone and then clicked it — Stefano, Dec 27 '21 at 20:40

HedgeHog · Accepted Answer · 2021-12-27T21:18:10.630

How to fix?

Add some "headers" to your request to show that you might be a "browser", this will give you a 200 as response and you can save the file.

Note This will also work for postman, just overwrite the hidden user agent and you will get the image as response

Example (python)

import requests
headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
url = "https://cdn.mangaeden.com/mangasimg/d0/d08f07d762acda8a1f004677ab2414b9766a616e20bd92de4e2e44f1.jpg"
res = requests.get(url,headers=headers)
with open("image.jpg", 'wb') as f:
        f.write(res.content)

score 0 · Answer 2 · answered Jan 03 '22 at 10:03

Someone wrote this answer, but later deleted it, so I will copy the answer in case it can be useful.

AFAIK, you can't download anything else apart from HTML Documents using jsoup.

If you open up Developer Tools on your browser, you can get the exact request the browser has made. With Chrome, it's something like this.

The minimal cURL request would in your case be:
'https://cdn.mangaeden.com/mangasimg/aa/aa75d306397d1d11d07d66746dae78a36dc78672ae9e97a08cb7abb4.jpg'
\   -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21
(KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21' \   --output
image.jpg;

You can refer to HedgeHog's answer for a sample Python solution; here's how to achieve the same in Java using the new HTTP Client:

import java.net.URI; import java.net.http.HttpClient; import
java.net.http.HttpRequest; import
java.net.http.HttpResponse.BodyHandlers; import java.nio.file.Path;
import java.nio.file.Paths;

public class ImageDownload {
    public static void main(String[] args) throws Exception {
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://cdn.mangaeden.com/mangasimg/aa/aa75d306397d1d11d07d66746dae78a36dc78672ae9e97a08cb7abb4.jpg"))
            .header("user-agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0
Safari/535.21")
            .build();
        client.send(request, BodyHandlers.ofFile(Paths.get("image.jpg")));
    } }

I adopted this solution in my java code. Also, one last bit, if the image is downloaded but you can't open it, it is probably due to a 503 error code in the request, in this case you will just have to perform the request again. You can recognize broken images because the image reader will say something like

Not a JPEG file: starts with 0x3c 0x68

which is <h, an HTML error page instead of the image

Retrieving an image gives 403 error while it works with browser

2 Answers2

How to fix?

Example (python)