UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Question

I'm getting this error on this line:

logger.debug(u'__call__ with full_name={}, email={}'.format(full_name, email))

Why?

The contents of the name variable is Gonçalves.

it's because logger takes in presumably only utf-8 characters and therefore you can't log 'ç' — Maxxik CZ, Dec 05 '19 at 08:00
Possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) — Legorooj, Dec 05 '19 at 08:19
It's more like [UnicodeDecodeError when logging an Exception in Python](https://stackoverflow.com/questions/28626984/unicodedecodeerror-when-logging-an-exception-in-python). logger can handle unicode but the console may not be able to. — FiddleStix, Dec 05 '19 at 09:59
@FiddleStix the proposed solution to that question is to use unicode strings, which I have done. — quant, Dec 05 '19 at 10:39
@Legorooj I don't understand how this is the same. More generally, I find the document linked in the answer to be profoundly unhelpful (I got half way through, became confused, and gave up). Is there a simple explanation for what is happening here and how to fix it? — quant, Dec 05 '19 at 10:43
As @MaxxikCZ said unicode strings don't support certain characters. Are the `full_name` and `email` vars already unicode? If not, convert them before hand. This could (probably) raise errors in other places, but they'll be easier to catch. — Legorooj, Dec 05 '19 at 10:50

score 3 · Answer 1 · answered Dec 05 '19 at 11:49

The problem is that full_name is a str, not a unicode object.

# -*- coding: UTF-8 -*-
import logging

logging.basicConfig()
logger = logging.getLogger()
logger.warning('testing')

# unicode.format(str) raises an error
name = 'Gonçalves'
print type(name)
print name
try:
    message = u'{}'.format(name)
except UnicodeDecodeError as e:
    print e

# but logger(unicode) is fine
logging.warn(u'Gonçalves')

# so unicode.format(str.decode()) doesn't raise
name = 'Gonçalves'
print type(name)
print name
message = u'{}'.format(name.decode('utf-8'))
logging.warning(message)


# and neither does unicode.format(unicode)
name = u'Gonçalves'
print type(name)
print name
message = u'{}'.format(name)
logging.warning(message)

score 2 · Accepted Answer · answered Dec 05 '19 at 11:30

2

This should fix your problem:

full_name, email = [unicode(x, 'utf-8') for x in [full_name, email]]

logger.debug(u'__call__ with full_name={}, email={}'.format(full_name, email))

The problem was that the default encoding of unicode strings is ASCII, which only supports 128 characters. Using UTF-8 will fix this problem.

Disclaimer This could be wrong on specifics, I code in py3 only. Learned all this in about 5 mins.

answered Dec 05 '19 at 11:30

Legorooj

2,646
2
15
35

This seems to have fixed my issue. Thanks. – quant Dec 08 '19 at 21:50
@quant glad to help - I'll admit I understand the doc linked in poss duplicate didn't make sense - it's taken me 4 years to get the hang of understanding confusing documentation. – Legorooj Dec 08 '19 at 21:52
@quant also in py3 strings and unicode are the same, and they use utf-8 by default. – Legorooj Dec 08 '19 at 21:53

Orsiris de Jong · Answer 3 · 2022-04-15T13:30:58.067

I'm unburrying this old thread in order to propose a solution that adds a context filter to logger which in return makes sure every single string passed to logger will be an unicode string when using python 2.x.

TL:DR; see the end of the post for a ready to use solution

<!-- lang: python -->
# First, let's create a string to unicode failsafe function
def safe_string_convert(string):
"""
Allows to encode strings for hacky UTF-8 logging in python 2.7
"""

try:
    return string.decode('utf8')
except UnicodeDecodeError:
    try:
        return string.decode('unicode-escape')
    except Exception:
        try:
            return string.decode('latin1')
        except Exception:
            return(b"String cannot be decoded. Passing it as binary blob" + bytes(string))


# Create a logger contextFilter class that will fix encoding for Python 2

class ContextFilterWorstLevel(logging.Filter):
    """
    This class re-encodes strings passed to logger
    Allows to change default logging output or record events
    """

    def __init__(self):
        self._worst_level = logging.INFO
        if sys.version_info[0] < 3:
            super(logging.Filter, self).__init__()
        else:
            super().__init__()


    def filter(self, record):
        # type: (str) -> bool
        """
        A filter can change the default log output
        This one simply records the worst log level called
        """
        # Examples
        # record.msg = f'{record.msg}'.encode('ascii', errors='backslashreplace')
        # When using this filter, something can be added to logging.Formatter like '%(something)s'
        # record.something = 'value'
        # python 2.7 comapt fixes
        if sys.version_info[0] < 3:
            record.msg = safe_string_convert(record.msg)
        return True

#####
# Now let's create a new logger and try it
#####

log_filter = ContextFilterWorstLevel()
logger = logging.getLogger()

# Remove earlier handlers if exist
while _logger.handlers:
    _logger.handlers.pop()

# Add context filter
logger.addFilter(log_filter)

# Test
logger.info('Café non unicode string")

Ready to use solution: ofuntions.logger_utils package. Install with pip install ofunctions.logger_utils

Usage:

from ofunctions.logger_utils import logger_get_logger

logger = logger_get_logger(log_file='somepath')
logger.info('Café non unicode')

Hope this will make python 2.x backporters life easier.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

3 Answers3