0

I have an application that is compiled with PyInstaller that uses a sqlite database. Everything works fine until a user with special characters in their name runs the software. Even simple code like this:

import sqlite3
path = "C:\\Users\\Jøen\\test.db"

db = sqlite3.connect(path)

Results in a traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: unable to open database file

I have tried all kinds of combinations including using chardet to detect the encoding and then converting to UTF-8 but that didn't work either. All of my usual Python encoding/decoding tricks are failing me at this point.

Has anyone successfully opened a SQLite DB in Python that has special characters in a path?

So if any of you have international or special characters in your user path, some test code to potentially help me:

import os
import sqlite3
path = os.path.expanduser("~")
sqlite3.connect(path + "\\test.db")
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343

2 Answers2

2

I see two issues:

  • \t is a tab character, \U is the start of a 8-hex-digit Unicode character escape.
  • You'd need to encode to the platform filesystem encoding, sys.getfilesystemencoding(), which on Windows is usually UTF-16 (little endian) or MBCS (Multi-byte character set, really meaning *any of our supported multi-byte encodings, including UTF-16), but not UTF-8. Or just pass in a Unicode string and let Python worry about this for you.

On Python 2, the following should work:

path = ur"C:\Users\Jøen\test.db"

This uses a raw unicode string literal, meaning that it'll a) not interpret \t as a tab but as two separate characters and b) produce a Unicode string for Python then to encode to the correct filesystem encoding.

Alternatively, on Windows forward slashes are also acceptable as separators, or you could double the backslashes to properly escape them:

path = u"C:/Users/Jøen/test.db"
path = u"C:\\Users\\Jøen\\test.db"

On Python 3, just drop the u and still not encode:

path = r"C:\Users\Jøen\test.db"

Building a path from the home directory, use Unicode strings everywhere and use os.path.join() to build your path. Unfortunately, os.path.expanduser() is not Unicode-aware on Python 2 (see bug 28171), so using it requires decoding using sys.getfilesystemencoding() but this can actually fail (see Problems with umlauts in python appdata environvent variable as to why). You could of course try anyway:

path = os.path.expanduser("~").decode(sys.getfilesystemencoding())
sqlite3.connect(os.path.join(path, u"test.db"))

But instead relying on retrieving the Unicode value of the environment variables would ensure you got an uncorrupted value instead; building on Problems with umlauts in python appdata environvent variable, that could look like:

import ctypes
import os

def getEnvironmentVariable(name):
    name= unicode(name) # make sure string argument is unicode
    n= ctypes.windll.kernel32.GetEnvironmentVariableW(name, None, 0)
    if n==0:
        return None
    buf= ctypes.create_unicode_buffer(u'\0'*n)
    ctypes.windll.kernel32.GetEnvironmentVariableW(name, buf, n)
    return buf.value

if 'HOME' in os.environ: 
    userhome = getEnvironmentVariable('HOME')
elif 'USERPROFILE' in os.environ:
    userhome = getEnvironmentVariable('USERPROFILE')

 sqlite3.connect(os.path.join(userhome, u"test.db"))
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Sorry again mistyping from my VM. The path is double encoded with slashes. Using `ur` ends up just throwing an ascii encoding problem. – user6026624 Mar 06 '16 at 22:08
  • `\U` would also cause problems using python3 – Padraic Cunningham Mar 06 '16 at 22:09
  • Using the `u` with `expanduser` throws an encoding error: `>>> path = os.path.expanduser(u"~") Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\ntpath.py", line 311, in expanduser return userhome + path[i:] UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 10: ordinal not in range(128) >>>` – user6026624 Mar 06 '16 at 22:32
  • Low on time, but that may be a bug in older 2.7 versions. Will investigate. – Martijn Pieters Mar 06 '16 at 22:54
  • Bah, humbug. It's considered [not a bug](http://bugs.python.org/issue18171) in Python 2. I'll update to using `sys.getdefaultencoding()` since the data comes from an environment variable. – Martijn Pieters Mar 06 '16 at 23:03
  • @user6026624 there; Windows is a difficult beast to deal with sometimes. Untested for now because I have no access to a Windows machine at the moment. – Martijn Pieters Mar 06 '16 at 23:25
0

The way that I found will actually work without having to deal with encoding (which I never did find a solution to) is to use the answer from here:

How to get Windows short file name in python?

The short name appears to always have the encoded characters removed based on my testing. I realize this is a kludge but I could not find another way.

Community
  • 1
  • 1