0

Let's say I have two filepaths:

/my/file/path.mov
/mé/fileé/pathé.mov

If I do something like:

{hashlib.md5(path).hexdigest() for path in paths}

Then I'll sometimes get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 217: ordinal not in range(128)

My quickfix was something along the lines of:

{hashlib.md5(path).hexdigest() for path in paths if path.isascii()}

But what would be a better way to deal with this?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Must be Python 2 (the `u'\xc1'` gives it away). It defaults encoding to `ascii` if you don't do it yourself. Python 3 *requires* encoding to a byte string if you start with a Unicode string because you can only hash bytes. – Mark Tolonen Aug 20 '21 at 21:16
  • 1
    Does this answer your question? [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) – msanford Aug 20 '21 at 21:17

2 Answers2

1

You need to provide an encoding yourself. In full generality, you can use UTF-8.

hashlib.md5(path.encode("utf-8"))
Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
1

The encoding that you have to give it is missing. utf -... followed by the number of the encode you want to use ...

Normally it should be fine like this:

hashlib.md5(path.encode("utf-8"))
Piero
  • 404
  • 3
  • 7
  • 22