0

In the source code of sqlalchemy I see following

    val = cursor.fetchone()[0]
    if util.py3k and isinstance(val, bytes):
        val = val.decode()

Why we do decode only for Python3 and doesn't do it for Python2?

Rudziankoŭ
  • 10,681
  • 20
  • 92
  • 192
  • At first glance, `val` is immediately pass to a function that assumes a `str` value. The same code in Python 2, being more lax about the difference between `str` and `unicode`, may not care which `val` is. – chepner Aug 12 '19 at 15:11

2 Answers2

4

In Python 3, "normal" strings are Unicode (as opposed to Python 2 where they are (Extended) ASCII (or ANSI)). According to [Python 3.Docs]: Unicode HOWTO - The String Type:

Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Example:

  • Python 3:

    >>> import sys
    >>> sys.version
    '3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]'
    >>>
    >>> b = b"abcd"
    >>> s = "abcd"
    >>> u = u"abcd"
    >>>
    >>> type(b), type(s), type(u)
    (<class 'bytes'>, <class 'str'>, <class 'str'>)
    >>>
    >>> b.decode()
    'abcd'
    >>> s.decode()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'decode'
    >>> u.decode()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'decode'
    
  • Python 2:

    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>>
    >>> b = b"abcd"
    >>> s = "abcd"
    >>> u = u"abcd"
    >>>
    >>> type(b), type(s), type(u)
    (<type 'str'>, <type 'str'>, <type 'unicode'>)
    >>>
    >>> b.decode()
    u'abcd'
    >>> s.decode()
    u'abcd'
    >>> u.decode()
    u'abcd'
    

val will be further passed (to _parse_server_version) as a str. Since in Python 3, bytes and str differ, the conversion is performed.

You could also check [SO]: Passing utf-16 string to a Windows function (@CristiFati's answer).

CristiFati
  • 38,250
  • 9
  • 50
  • 87
1

You can check out a detail documentation of string encoding frustration here.

In short, since SQLAlchemy contains legacy API that parses the data into bytes data, the said statement is a simple way to migrate the string bytes data to Unicode in python 3.

mootmoot
  • 12,845
  • 5
  • 47
  • 44