-2

I try to download an image using requests module in python.It works but when i try to open this image it showing "Fatal error reading PNG image file: Not a PNG file". Here is my error screenshot.And the code i used to download is,

import requests

img_url = "http://dimik.pub/wp-content/uploads/2020/02/javaWeb.jpg"

r = requests.get(img_url)

with open("java_book.png","wb") as f:
    f.write(r.content)

And i run my code in terminal just saying, python3 s.py (s.py is the name of file). Is something wrong in my code or something else in my operating system(ubuntu 20.04 LTS)?

Plabon Kumer
  • 45
  • 2
  • 12
  • 2
    the link returns 404. – Alif Jahan Jul 16 '20 at 04:41
  • Link working well here.[Here](https://pasteboard.co/JhRHcar.png) is the screenshot. – Plabon Kumer Jul 16 '20 at 04:44
  • I'm getting a webpage from that link. Are you sure you have the specific link to that specific image? – ewokx Jul 16 '20 at 04:49
  • Ya,i am sure.I copied link from my code and pasted it on other tab.And i got my image. – Plabon Kumer Jul 16 '20 at 04:53
  • that link is an invalid link to a non-existing file. That link as provided gives you a webpage of which that picture you're trying to get is just an element of it. You need to get the exact link to that image. – ewokx Jul 16 '20 at 04:55
  • 1
    Something weird with that URL. It worked once and then stopped working. – mpen Jul 16 '20 at 05:55
  • 1
    @mpen Agree.Because if i use another URL it's working fine with same code.[Here](https://paste.ubuntu.com/p/Fq9YYKN9tj/) is my code.Actually i am learning web crawling and i made a program to collect all image from specific a webpage,it's working well but when i try to open image, all showing error.[Here](https://paste.ubuntu.com/p/Sc9kjFr4jR/) it is. – Plabon Kumer Jul 16 '20 at 06:47

5 Answers5

4
import requests

response = requests.get("https://devnote.in/wp-content/uploads/2020/04/devnote.png")

file = open("sample_image.png", "wb")
file.write(response.content)
print (response.content)
file.close()

https://devnote.in/wp-content/uploads/2020/04/devnote.png this url is Disable mod_security. so this return error like : <html><head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>.

Disable mod_security using .htaccess on apache server

Mod_security can be easily disabled with the help of .htaccess.

<IfModule mod_security.c>
  SecFilterEngine Off
  SecFilterScanPOST Off
</IfModule>
Fefar Ravi
  • 794
  • 7
  • 18
2

It's because you tried to save javaWeb.jpg (A JPG file) as java_book.png (A PNG file).

Pyzard
  • 451
  • 3
  • 14
  • It's not.If i change this then it showing new error `error interpreting jpeg image file (not a jpeg file starts with 0x3c 0x21)` . [Here](https://pasteboard.co/JhSjS6K.png) is the screenshot. – Plabon Kumer Jul 16 '20 at 06:20
  • Because there's `HTML` for some reason in `r.content`. I just tried it out. – Pyzard Jul 16 '20 at 17:54
  • [This](https://docs.google.com/document/d/1S9VOQ2ufTs0E1o7i5tn6iRGyJm9-G9722BX7fNsfWqI/edit?usp=sharing) is what I get for `r.content`. – Pyzard Jul 16 '20 at 17:58
  • 1
    There's nothing wrong with your operating system. First you got an error because the image was a JPEG file. And now it's because there's HTML in `r.content`. – Pyzard Jul 16 '20 at 18:08
  • Oh, i understand now.Is there any way to fix this? – Plabon Kumer Jul 17 '20 at 07:06
  • This might sound confusing, but you can try: Create a `flask` app; Create an `app.route` at `/book/` and have it return the Java book image; Save the image with `requests`: `requests.get('http://127.0.0.1:5000/book')`. – Pyzard Jul 18 '20 at 03:46
  • BTW, `starts with 0x3c 0x21` means `starts with < !` because, see for yourself: `chr(0x3c), chr(0x21)`. – Pyzard Jul 18 '20 at 04:00
  • And that's true because `HTML` documents start with ` `. – Pyzard Jul 18 '20 at 04:01
1

In an attempt to see what we are working on, I've tried replicating the issue, please see below what found out.

1.) The file you are attempting to open is the ENTIRE HTML document. I can support this, because we are finding !DOCTYPE html at the beginning of your 'wb' or WRITE BINARY command.

This shows a notepad file what we are working on.

<---------------------------------------------- WE ARE AT AN IMPASSE

From here we have a few options to solve our problem.

a.) We could simply download the image from the web page - placing it in a local folder/directory/ or wherever you want it. This is by far our easiest call, because it allows us to call and open it for later without having to do too much. While I'm on a Windows machine - Ubuntu should have no problem doing this either (Unless you aren't in an UBUNTU with a GUI - that can be fixed with startx IF SUPPORTED)

b.) If you have to pull directly from the site itself, you could try something like this using BEAUTIFULSOUP from this answer here. Honestly, I've never really used the latter option since downloading and moving is much more effective.

0

You just need to save the image as a JPG.

import requests

img_url = "http://dimik.pub/wp-content/uploads/2020/02/javaWeb.jpg"

r = requests.get(img_url)

with open("java_book.jpg","wb") as f:
    f.write(r.content)
Pyzard
  • 451
  • 3
  • 14
0

Yeah, it's a full HTML document:

java_book.jpg

Pyzard
  • 451
  • 3
  • 14