83

While porting code from python2 to 3, I get this error when reading from a URL

TypeError: initial_value must be str or None, not bytes.

import urllib
import json
import gzip
from urllib.parse import urlencode
from urllib.request import Request


service_url = 'https://babelfy.io/v1/disambiguate'
text = 'BabelNet is both a multilingual encyclopedic dictionary and a semantic network'
lang = 'EN'
Key  = 'KEY'

    params = {
        'text' : text,
        'key'  : Key,
        'lang' :'EN'

        }

url = service_url + '?' + urllib.urlencode(params)
request = Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib.request.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
            buf = StringIO(response.read())
            f = gzip.GzipFile(fileobj=buf)
            data = json.loads(f.read())

The exception is thrown at this line

buf = StringIO(response.read())  

If I use python2, it works fine.

smci
  • 32,567
  • 20
  • 113
  • 146
AMisra
  • 1,869
  • 2
  • 25
  • 45
  • Can you please provide full traceback? – Anand S Kumar Jun 26 '15 at 04:35
  • It just gives this error and halts. TypeError: initial_value must be unicode or None, not str – AMisra Jun 26 '15 at 04:52
  • can you include the value of your variable `url`? when I try it with `url = 'http://www.google.com'` the code runs fine for me – maxymoo Jun 26 '15 at 05:18
  • 1
    To anyone coming here due to `pandas.read_csv()` hiccuping on inferring the wrong encoding on ASCII input, use `encoding='utf8'` or `'latin1'` to force it. – smci Mar 26 '20 at 21:14

4 Answers4

148

response.read() returns an instance of bytes while StringIO is an in-memory stream for text only. Use BytesIO instead.

From What's new in Python 3.0 - Text Vs. Data Instead Of Unicode Vs. 8-bit

The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.

tynn
  • 38,113
  • 8
  • 108
  • 143
29

This looks like another python3 bytes vs. str problem. Your response is of type bytes (which is different in python 3 from str). You need to get it into a string first using response.read().decode('utf-8') say and then use StringIO on it. Or you may want to use BytesIO as someone said - but if you expect it to be str, preferred way is to decode into an str first.

gabhijit
  • 3,345
  • 2
  • 23
  • 36
  • i think this is best answer for modules that can only read from StringIO and BytesIO such as `Bio.SeqIO` as in `records = SeqIO.parse(StringIO(r.read().decode('utf-8')), "fasta")` – Brian Wiley Jun 15 '22 at 06:21
3

Consider using six.StringIO instead of io.StringIO.

Max Bileschi
  • 2,103
  • 2
  • 21
  • 19
0

And if you are migrating code from python2 to python3 and using suds old version use "suds-py3" for python3

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 25 '21 at 00:05
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/29913868) – rv.kvetch Sep 25 '21 at 04:42