0

I'm making a simple API call to url, saving the data as a python dictionary d and print d.

import requests

r = requests.get(url)
d = r.json()
print(d)

When I execute the script via cmd.exe on Windows 10 everything works:

> python script.py
{'pagination': {'page': 1, 'pages': 2, 'per_page': 50, 'items': 58, 'urls': {'last': ...

But why does it throw an error when I run it via Git Bash? Can you help me understand the error?

$ git --version
git version 2.33.0.windows.2

$ python script.py
Traceback (most recent call last):
  File "C:\Users\...\Projects\discogs-data\script.py", line 6, in <module>
    print(d)
  File "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2153' in position 1544: character maps to <undefined>

I assume that Python tries to decode d by using the cp1252-encoding before printing it. But why does it have to decode d in the first place and why does it work with cmd.exe but not with Git Bash?

Moritz Wolff
  • 436
  • 1
  • 7
  • 16
  • Can you print `sys.stdout.encoding` in both cases? – Abdul Niyas P M Sep 22 '21 at 14:38
  • It does not try to *decode* anything, but it has to *encode* it to be able to print it. The offending character is a unicode U+2153 VULGAR FRACTION ONE THIRD (`⅓`). It cannot be displayed in the Git Bash terminal window which only accepts cp1252 characters. You could use `print(d.encode('cp1252', errors='replace'))` – Serge Ballesta Sep 22 '21 at 14:54
  • @AbdulNiyasPM utf-8 for cmd and cp1252 for bash. – Moritz Wolff Sep 22 '21 at 16:37
  • @SergeBallesta adding `sys.stdout.reconfigure(encoding='utf-8')` at the start of the file also worked for me. – Moritz Wolff Sep 22 '21 at 16:39

1 Answers1

0

I think both(Windows and Git Bash) uses different encoding for standard output. You can confirm it via sys module.

>>> import sys
>>> print(sys.stdout.encoding)

From your traceback Git Bash uses cp1252 encoding which cannot encode all character that d is pointing to. If you still want to display characters in GIT bash terminal you can encode the string with 'cp1252' encoding and set errors as replace so any non-encodable characters will be render as ?

>>> a = '\u2153' # from your error message.
>>> a.encode('cp1252', errors='replace')
b'?'
>>> a.encode('cp1252', errors='replace').decode()
'?'
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46