-1

What Character set is é from? In Windows notepad having this character in an ANSI text file will save fine. Insert something like and you'll get an error. é seems to work fine in ASCII terminal in Putty (Are CP437 and IBM437 the same?) where as does not.

I can see that is Unicode, not ASCII. But what is é? It doesn't give errors I get with Unicode in Notepad, but Python was throwing SyntaxError: Non-ASCII character '\xc3' in file on line , but no encoding declared; before I added a "magic comment" as suggested by Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP).

I added the "magic comment" and don't get that error, but os.path.isfile() is saying a filename with é doesn't exist. Ironic that the character é is in Marc-André Lemburg, the author of the PEP the error links to.

EDIT: If I print the path of the file, the accented e shows up as ├⌐ but I can copy and paste é into the command prompt.

EDIT2: See below

Private    > cat scratch.py   ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private    > python scratch.py
Traceback (most recent call last):
  File "scratch.py", line 3, in <module>
    file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private    >

EDIT3:

Private    > PS1="Private    > " ; echo code below ; cat scratch.py ; echo =======  ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-

file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
    folder = "C:/path/folder_one"
elif hostname == "Two":
    folder = "C:/path/folder_two"
else:
    folder = "C:/path/folder_three"

path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")


print path
=======
output below
Traceback (most recent call last):
  File "scratch.py", line 18, in <module>
    path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private    >
user324747
  • 255
  • 3
  • 16
  • 3
    Please concentrate on the actual problem you are facing. Like this it is hard to find the question you are asking. – Klaus D. May 12 '20 at 00:12
  • I am trying to check if a file exists (then move, delete, or copy it) and the file has the accented e on it. I am trying to get around that, but also curious as to whether the accented e is ASCII, Unicode, or other. It seems to be somewhere between. Like a "special" ASCII character. I imagine this is due to it's use in Latin Alphabets, mainly French, but also in English among others as well. – user324747 May 12 '20 at 00:17
  • [this website](https://unicode-table.com/en/) says it's UNICODE. – Johnny May 12 '20 at 00:21
  • @Johnny isn't everything? Does encoding affect if it's unicode or not? – ilsloaoycd May 12 '20 at 00:23
  • @Klaus I can see what you mean by focusing on the actual problem (I upvoted your comment even) and seen that said before. But also I have seen questions that are more simple Q&A. I'm trying raw strings to see if that helps. Also further reading for me, https://stackoverflow.com/questions/643694/what-is-the-difference-between-utf-8-and-unicode – user324747 May 12 '20 at 00:31
  • By the looks of things you're using python 2, I'm pretty sure the encoding line at the top should be `# -*- coding: utf-8 -*-` You will save yourself a lot of headaches by switching to Python 3. Python 2 on a Windows filesystem with unicode characters can be a nightmare. – Andrew May 12 '20 at 01:09

1 Answers1

0

You need to tell unicode what encoding the string is in, in this case it's utf-8 not ascii, and the file header should be # -*- coding: utf-8 -*-, Encoding Declarations

# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")
  1 Help on class unicode in module __builtin__:
  2
  3 class unicode(basestring)
  4  |  unicode(object='') -> unicode object
  5  |  unicode(string[, encoding[, errors]]) -> unicode object
  6  |
  7  |  Create a new Unicode object from the given encoded string.
  8  |  encoding defaults to the current default string encoding.
  9  |  errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

And as I mentioned in my previous comment you will save yourself a lot of headaches by switching to Python 3. Python 2 on a Windows filesystem with unicode characters can be a nightmare.

Andrew
  • 396
  • 1
  • 5
  • Thanks, Andrew. You seem to be pointing me in the right directs, when I print the file_name in my python script it displays `é` instead of `├⌐` but I get a new `TypeError: decoding Unicode is not supported` error when defining the full path to the file (see edit 3). – user324747 May 12 '20 at 17:54
  • I read learning Python 2 and Python 3 at the same time just creates more confusion. But maybe since then I've learned enough Python2 to be able to proceed to Learning Python3 (six will help). I noticed Python2 isn't included in modern Linux distros, only Python3 is. So I installed Python3 on one of my machines (messed up associations for my Python2 scripts, is there an alternate extension of Python3 files to differentiate?) – user324747 May 12 '20 at 17:57
  • To control which version is used when you open a python file from explorer set the first line in your python file to `#! python2` for python 2, and `#! python3` for python3. [documentation](https://docs.python.org/3/using/windows.html#from-a-script) – Andrew May 12 '20 at 18:43
  • The default encoding in Python 2 on Windows isn't `utf-8` so on the line `path = "%s/%s" % (folder, file_name)` `file_name` is decoded using the `windows-1252`. This would work `path = "%s/%s" % (folder, file_name.encode("utf-8"))` – Andrew May 12 '20 at 18:55