Python 3 fails at pdb "b main" with UnicodeDecodeError?

Question

The only similar question to this I've found is Django UnicodeDecodeError when using pdb - unfortunately, the solution there does not apply to this case.

Consider the following code, test.py:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# encoding: utf-8

def subtract(ina, inb):
  myresult = ina - inb
  return myresult

def main():
  y2 = 10
  y1 = 7
  # calculate (y₂-y₁)
  print("Calculating difference between y2: {} and y1: {}".format(y2, y1))
  result = subtract(y2, y1)
  print("The result is: {}".format(result))

if __name__ == '__main__':
  main()

Using Python3 from Anaconda3 on Windows 10:

(base) C:\tmp>conda --version
conda 4.7.12

(base) C:\tmp>python --version
Python 3.7.3

... I can run this program without a problem:

(base) C:\tmp>python test.py
Calculating difference between y2: 10 and y1: 7
The result is: 3

However, if I want to debug/step through this program using pdb, it fails as soon as I type b main to set a breakpoint on the main function:

(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 648, in do_break
    lineno = int(arg)
ValueError: invalid literal for int() with base 10: 'main'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 659, in do_break
    code = func.__code__
AttributeError: 'str' object has no attribute '__code__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1701, in main
    pdb._runscript(mainpyfile)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1570, in _runscript
    self.run(statement)
  File "C:\ProgramData\Anaconda3\lib\bdb.py", line 585, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "c:\tmp\test.py", line 6, in <module>
    def subtract(ina, inb):
  File "c:\tmp\test.py", line 6, in <module>
    def subtract(ina, inb):
  File "C:\ProgramData\Anaconda3\lib\bdb.py", line 88, in trace_dispatch
    return self.dispatch_line(frame)
  File "C:\ProgramData\Anaconda3\lib\bdb.py", line 112, in dispatch_line
    self.user_line(frame)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 261, in user_line
    self.interaction(frame, None)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 352, in interaction
    self._cmdloop()
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 321, in _cmdloop
    self.cmdloop()
  File "C:\ProgramData\Anaconda3\lib\cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 418, in onecmd
    return cmd.Cmd.onecmd(self, line)
  File "C:\ProgramData\Anaconda3\lib\cmd.py", line 217, in onecmd
    return func(arg)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 667, in do_break
    (ok, filename, ln) = self.lineinfo(arg)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 740, in lineinfo
    answer = find_function(item, fname)
  File "C:\ProgramData\Anaconda3\lib\pdb.py", line 100, in find_function
    for lineno, line in enumerate(fp, start=1):
  File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 199: character maps to <undefined>
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> c:\programdata\anaconda3\lib\encodings\cp1252.py(23)decode()
-> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
(Pdb) q
Post mortem debugger finished. The test.py will be restarted
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) q

(base) C:\tmp>

The problem is the comment line: # calculate (y₂-y₁); if it is deleted, then pdb starts fine:

(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Breakpoint 1 at c:\tmp\test.py:10
(Pdb) q

(base) C:\tmp>

I'm slightly surprised by this - wasn't Python3 supposed to be "utf-8 by default"?

Obviously, this is a trivial case where I can easily erase the single comment line that causes the trouble. However, I have a large script, where I have utf-8 characters all over the place, both in comments, and in prints I'd actually want to step through, and it is not really viable to go in and manually change all those instances to UTF-8 characters.

So, is there a way to cheat Python3's pdb, so it works - even if there are utf-8 characters present in the source code (regardless if in comments, or in actual commands)?

Is your file __really__ utf-8 encoded ? The fact that it's now the "default" only means you don't have to specify it _if the file is effectively in utf8_ - if it's anything else, well... — bruno desthuilliers, Dec 16 '19 at 10:31
NB I assume you fully understand the difference between utf8 and unicode... — bruno desthuilliers, Dec 16 '19 at 10:33
Thanks @brunodesthuilliers - I understand Unicode as a table that maps integers to characters, and utf-8 as encoding that describes how those integers are to be encoded in the text file. I just checked my file in Notepad++, Encoding tab says UTF-8 (also, I wouldn't have been able to copy paste `# calculate (y₂-y₁)` if that wasn't the case, I guess I would have had something like `# calculate (yâ‚‚-yâ‚)`) — sdbbs, Dec 16 '19 at 10:36
"I understand Unicode as a table that maps integers to characters" => well, not quite. Unicode uses "code points", and encoding (utf8 or any other) map those code points (or a subset of...) to bytes or sequences of bytes. UTF8 is an encoding that supports the full unicode, but quite a few other encodings can support subscripts (your "y₂" things) so don't assume that just because you can use non-ascii characters you're using UTF8. This being said if notepad++ tells you your file is indeed utf8 encoded, then the next thing to look at is your environment, as explained in snakecharmerb's answer. — bruno desthuilliers, Dec 16 '19 at 10:45

snakecharmerb · Accepted Answer · 2019-12-17T08:24:55.133

2

Python 3 is UTF-8 by default, but the environment in which it is operating is not - it has a default encoding of cp1252.

You can set the PYTHONIOENCODING environment variable to UTF-8 to override the default encoding, or change the environment to use UTF-8.

Edit

I analysed this too hastily. The above solutions apply to fixing unicode errors raised when reading or writing from stdin/stdout, but the problem here is that pdb opens a file for reading without specifying an encoding:

def find_function(funcname, filename):
    cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
    try:
        fp = open(filename)
    except OSError:
        return None

If no encoding is specified, according to the io docs Python will default to using the result of locale.getpreferredencoding - presumably cp1252 in this case.

One solution might be to set the console locale before running the debugger.

It may also be possible to set the PYTHONUTF8 environment variable to 1. Amongst other things, this will cause

open(), io.open(), and codecs.open() use the UTF-8 encoding by default.

edited Dec 17 '19 at 08:24

answered Dec 16 '19 at 10:34

snakecharmerb

47,570
11
100
153

Thanks @snakecharmerb - I've tried `chcp 65001`, `set PYTHONIOENCODING=UTF-8`, `SET PYTHONLEGACYWINDOWSIOENCODING=1` before running `python -mpdb ...` - none of this worked. `win-unicode-console` seems to be installed in Anaconda, but if I do `python -mrun -mpdb test.py`, then the script just runs, never falls down into the debugger. Only thing that seems to work, is to write inside the script `import pdb`, and add in `main()` as first line: `pdb.set_trace()`, which I find a bit more tedious than just calling `python -mpdb ...`. – sdbbs Dec 16 '19 at 11:01
1

@sdbbs - sorry, I I didn't analyse this thoroughly enough before answering - I've done some more research on this, see the edit to the answer. – snakecharmerb Dec 17 '19 at 08:25
Many, many thanks @snakecharmerb - the analysis looks great, a lot of things there I had no idea about (and couldn't have guessed either)! – sdbbs Dec 17 '19 at 10:15

Python 3 fails at pdb "b main" with UnicodeDecodeError?

1 Answers1