0

I've been searching to automatically import some files but since I'm on Windows i got the unicode error (because of the "C:\Users\..."). I've been looking to correct this error and found some hints (using r"MyString" or u"MyString" for raw and unicode strings) and I have been directed to this page (https://docs.python.org/3/howto/unicode.html).

But since my problem is about a GUI interface to automatically import some files, I haven't figured out the way to do it.

I'll leave you my hints right here :

 file = file.replace('\\', '//')

 file = r"MyFilePath" 

 file = u"MyFilePath" 

 file = os.path.abspath("MyFilePath") 

 file = "MyFilePath".decode('latin1')
 """ isn't correct because a string has no attribute 'decode' of course """ 

One of those two seems to be nice but I don't know how to let python understands that I want to copy the path behind the r or the u.

Or is there a way to tell Python :

file = StopThinkingWithUnicode("MyFilePath")

I've also see this link (Deal with unicode usernames in python mkdtemp) but doesn't work neither (I've corrected the print() function because of the Python2.7 write and I'm on 3.5)

I've forgotten to post the traceback so there it is :

  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
  File "<ipython-input-13-d8c2e72a6d3f>", line 1
  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
            ^
  SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Could someone help me with me some hints or link? Thank for your help.

PS : I've tried setting at the first line of the script :

 # -*- coding: latin-1 -*- 

(I have *.xl , *.csv, *.sas7bdat, *.txt files)

Community
  • 1
  • 1
Steven S.
  • 3
  • 3

1 Answers1

2

That's a very frequent issue with windows paths. I suspect that people stumble upon it, and figure out a way by putting the "annoying" lowercase letters matching escape sequences (\n,\t,\b,\a,\v,\x ...) in upper case. It works, except for \U (which is unicode escape sequence) and \N .

The real solution is to use raw prefix to treat backslashes literally:

MyFilePath = r"C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
             ^

EDIT: my theory about "bug avoidance by uppercase confirms. Check the path in this question: Largest number of rows in a csv python can handle?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Do you know a way to sort of concat r + "MyFilePath" (MyFilePath is actually chosen by the user with the explorer) ? (and not type but me as I did here - sorry if it wasn't clear) I've tried using ''.join(["r" + MyFilePath]) but because of the Unicode error, MyFilePath isn't handled – Steven S. Jan 26 '17 at 15:24
  • Nevermind find that tkinter can return the path of the file and auto convert the "\" to "/" ( http://stackoverflow.com/questions/3579568/choosing-a-file-in-python-with-simple-dialog if someone is looking for it) Thanks @jean-françois Fabre – Steven S. Jan 26 '17 at 15:44
  • @StevenS. Raw strings are *only* for creating string constants in source code. If your user is entering a filename from a GUI the string will be correct. Your problem is probably with joining strings. Show a **small** example of what you are *actually* trying to do instead of vague hard-coded examples that don't represent what the user is doing. – Mark Tolonen Jan 26 '17 at 15:46
  • @StevenS., there is no reason to replace backslash with slash in a Windows path, and it can potentially cause problems because there are cases in which a path *must* use backslash. Thus to the contrary you should use `os.path.normpath` to *ensure* that a path only uses backslash. Also, your aversion to Unicode on Windows seems masochistic. ANSI/OEM codepages are a deprecated legacy from DOS-based Windows -- last released as Windows ME circa 2000. Windows XP and later are based on NT, a Unicode platform. – Eryk Sun Jan 26 '17 at 21:57
  • 2
    Note that the suggestion to use raw strings isn't an option for Unicode string literals in Python 2. Its parsing is broken, and there are no plans to ever fix it, e.g. `ur'C:\Users'` is a `SyntaxError`. Solution: upgrade to Python 3. – Eryk Sun Jan 26 '17 at 22:02