8

I was reading the documentation on the requests lib and it seems to be tremendously outdated or something.

I was going step by step, trying all the examples shown there and encountered a problem as I tried running the following piece:

import requests
from PIL import Image
from StringIO import StringIO

response = requests.get('http://www.github.com')
i = Image.open(StringIO(response.content))

That piece is from the official documentation. The first error that I got was the ImportError: no module named StringIO

Okay, then I found out that that module no longer exists, and in order to import StringIO one has to write from io import StringIO. I did that. Tried running the code again and this time it errored out with TypeError:initial_value must be str or None, not bytes. What on earth did I do wrong? I don't follow...All I did was try running the code from the official doc....I'm clueless.

EDITED: And yeah...to use PIL one has to install Pillow.

Kermit
  • 4,922
  • 4
  • 42
  • 74
Albert
  • 2,146
  • 10
  • 32
  • 54
  • This might help [http://stackoverflow.com/questions/31064981/python3-error-initial-value-must-be-str-or-none] – Prashant Mar 17 '16 at 16:06
  • There are some major differences between Python2.x and Python3.x (particularly concerning string handling). This is the cause of *some* of your issues. Where you're getting `response.get...` from and why you might think it's the same as the `requests` module, I don't know. – jDo Mar 17 '16 at 16:07
  • @jDo oops, sorry that's just a typo. Indeed it should be requests.get(....). – Albert Mar 17 '16 at 16:08
  • @jDo I tried running it with different versions of python, but always with the same outcome. – Albert Mar 17 '16 at 16:10
  • @Albert Do you have a link to the example you posted here? – jDo Mar 17 '16 at 16:18
  • Why are you trying to open HTML with Image.open()? – pholtz Mar 17 '16 at 16:22
  • @jDo yap, I do. Here it is https://media.readthedocs.org/pdf/requests/master/requests.pdf – Albert Mar 17 '16 at 17:42
  • @pholtz well as far as I understand, that complext function is supposed to create an image from binary data returned by request. And that's what is written in the tutorial. I still don't know how it all works, the first thing I wanted to do is just to see how it works....but couldn't do that.... – Albert Mar 17 '16 at 17:45
  • Ok with that in mind this question makes much more sense. You just need to plug in a link that will return image data, like this one: https://github.com/fluidicon.png – pholtz Mar 17 '16 at 17:57
  • The only occurrence of "www.github.com" I could find in that pdf is this `>>> r = requests.get('http://github.com')` Did you expect the URL "www.github.com" to return an image? The example you seem to be working with looks different from what you've posted here: `b'[{"repository":{"open_issues":0,"url":"https://github.com/...` The three dots following "github" indicate that you should write out a full path pointing to an image on the domain `github.com` – jDo Mar 17 '16 at 18:04

2 Answers2

13

from what you say, you're running python3 (as the StringIO package has been renamed io in python3, not python2) and your example is python2 (for obvious reasons).

So for your issue:

"TypeError:initial_value must be str or None, not bytes".

What that means is that in:

response = requests.get('http://www.github.com')

you're either getting None or a response in bytes for response.content. Given that your request worked, and you can access response.content, it is very likely to be in bytes.

As the requests library works at a quite low level, and all data coming in and to sockets (including the HTTP socket) is plain binary (i.e. not interpreted), to be able to use the output in string functions you need to convert it into something.

In python3 str is the old unicode from python2, and bytes is close to the old str of python2. So you would need to convert the bytes into a string to feed StringIO:

i = Image.open(StringIO(response.content.decode('utf-8')))

for example. But then I'm expecting Image.open() to yell at you that it does not know wtf it is supposed to do with a unicode buffer, all it really wants is a byte array!

But because Image.open() is actually expecting a stream of bytes, and not a unicode stream, what you shall be doing is actually use a BytesIO instead of a StringIO:

from io import BytesIO
i = Image.open(BytesIO(response.content))

Finally, you're sweet to give an example, but it's not one that would work, as you're giving a link to an HTML page, instead of an image.

HTH

zmo
  • 24,463
  • 4
  • 54
  • 90
  • haha thanks! Pretty clear explanation. I will now try to do what you've suggested. – Albert Mar 17 '16 at 17:48
  • the strange thing is actually that here https://media.readthedocs.org/pdf/requests/master/requests.pdf they are doing about the same thing, they are also giving a link to an HTML page. But it looks like it's supposed to work somehow...I'm pretty curious! – Albert Mar 17 '16 at 17:51
  • I'm pretty sure that tutorial has been written with python2 in mind, whereas you're working with python3. That does not mean the tutorial is irrelevant, but you need to convert the examples first. You might want to check your scripts with the `2to3` tool, that will convert automatically most of the problematic python2 codes into python3. – zmo Mar 17 '16 at 17:53
0

It's a good idea to actually fetch an image from the internet if one wants to parse images :D (as opposed to fetching the index page at github.com)

import requests
from PIL import Image
from StringIO import StringIO

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Venn0110.svg/576px-Venn0110.svg.png"
response = requests.get(url)
i = Image.open(StringIO(response.content))

The example you're trying to use looks different from what you've posted here:

3.3.4 Binary Response Content
You can also access the response body as bytes, for non-text requests:
>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...
The gzip and deflate transfer-encodings are automatically decoded for you.
For example, to create an image from binary data returned by a request, you can use the following code:
>>> from PIL import Image
>>> from StringIO import StringIO
>>> i = Image.open(StringIO(r.content))

https://github.com/... <-- these three dots (ellipses) indicate that the URL has been shortened in the example.

source: Requests Documentation Release 2.9.1

jDo
  • 3,962
  • 1
  • 11
  • 30
  • 2
    haha you're right...I can't believe I asked that.ooh I should not ask questions when I'm not sober :D – Albert Mar 18 '16 at 10:55