-1

What's the difference between open('filepath', 'rb') and open(rb'filepath')? The second gives me the following Unicode related error when I try to hash the contents:

TypeError: Unicode-objects must be encoded before hashing

Here's my sample code

import hashlib
f1 = open('findMaxConsecutiveOnes.py', 'rb')
f2 = open(rb'findMaxConsecutiveOnes.py')

hashlib.sha512()
hasher.update(f1)
hasher.update(f2) #error comes up here

I am on Python 3.6.0

  • 2
    One is passing a second parameter to `open` the other creates a raw binary string and pass it to `open` as the first parameter. – Christian Dean Sep 28 '17 at 01:12

1 Answers1

2

open('findMaxConsecutiveOnes.py', 'rb') will open the file findMaxConsecutiveOnes.py for read mode (r) with binary I/O (b).

In rb'findMaxConsecutiveOnes.py', the r prefix marks the string literal as raw (which doesn't do anything in this particular case), while the b prefix marks it as binary, which means that the resulting object is a bytes object, not a str object.

open() doesn't care about this; it will just convert the first operand to a string. However, because the second parameter (the file open mode) is omitted, the file is opened with the default mode, which is 'r' (not 'rb') and therefore is opened for read access in text-mode.

So these three expressions are effectively equivalent:

  • open(rb'findMaxConsecutiveOnes.py')
  • open('findMaxConsecutiveOnes.py')
  • open('findMaxConsecutiveOnes.py', 'r')

A file opened in text-mode will return Unicode str objects from read operations, while a file opened in binary-mode will return bytes objects. The hashlib hasher functions can only hash bytes; they don't know or care about Unicode strings. This makes sense because the hash functions themselves only operate on bytes. That's the source of the error.

Opening the file in binary-mode (the first example you give) circumvents this problem by not attempting to decode characters at all. It just reads raw bytes, and hashlib is happy. (It also doesn't waste any energy trying to decode characters that it would otherwise just have to re-encode later.)

cdhowie
  • 158,093
  • 24
  • 286
  • 300