Python "SyntaxError: Non-ASCII character '\xe2' in file"

Question

I am writing some python code and I am receiving the error message as in the title, from searching this has to do with the character set.

Here is the line that causes the error

hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")

I cannot figure out what character is not in the ANSI ASCII set? Furthermore searching "\xe2" does not give anymore information as to what character that appears as. Which character in that line is causing the issue?

I have also seen a few fixes for this issue but I am not sure which to use. Could someone clarify what the issue is (python doesn't interpret unicode unless told to do so?), and how I would clear it up properly?

EDIT: Here are all the lines near the one that errors

def createLoadBalancer():
    conn = ELBConnection(creds.awsAccessKey, creds.awsSecretKey)
    hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")
    lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])
    lb.configure_health_check(hc)
    return lb

There's no problem in what you posted; look in nearby lines. — kindall, Feb 07 '14 at 22:58
Did you try Mutant's suggestion? Do you have "smart quotes" (the curved and/or angled kind) anywhere in the file? — John Y, Feb 07 '14 at 23:50
Yes Mutants worked, along with using notepad or something else to save the file as ASCII, then using it as you were. — KDecker, Feb 08 '14 at 00:24
One example which might cause it is an EN DASH (`–` - `\xe2\x80\x93`) — Martin Thoma, Apr 18 '15 at 17:08
FWIW, I copied text from a Google doc into a comment string in my Python file. I carried over an apostrophe as in ```we're``` where the character ```'``` was the non-ascii problem. Hope my $0.02 helps someone down the road, also look for the various dash types and quote substitutions used in rich text documents. I was stumped for a few minute on this one. — Marc, Sep 21 '15 at 18:51
Another way for linux/unix users could be to use the command `cat -v filename.py` . This will show you non-ascii characters. These generally gets added when one copy code statements from some other editors like notepad, word etc. — Shrikant, Jan 08 '21 at 13:34

Chris Redford · Answer 1 · 2015-04-07T22:18:45.200

325

If you are just trying to use UTF-8 characters or don't care if they are in your code, add this line to the top of your .py file

# -*- coding: utf-8 -*-

edited Apr 07 '15 at 22:18

answered Jun 14 '14 at 16:28

Chris Redford

16,982
21
89
109

1

for me it not working. below error showing always. SyntaxError: Non-ASCII character '\xe2' in file /home/aslam/projects/deva_26nov/mylibrary/email_constants.py on line 393, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details – Aslam Khan May 21 '16 at 08:29
3

Is there a reason this isn't a chosen answer? – cph Jan 16 '18 at 22:37
@cph I wrote it 4 months after the question was asked :) – Chris Redford Jan 17 '18 at 18:34
@cph because, while this is very helpful, the chosen answer answers the question of "what character is not in the ANSI ASCII set?" Both are fine answers though and the first one usually wins in that case. – Arthur Dent Jul 18 '18 at 17:06
This answer is works for me and I think its the right one. – jrp Mar 04 '21 at 17:23
In my case I had zero-width characters in my urls that I copied. https://www.soscisurvey.de/tools/view-chars.php was a great tool to find them all. – knownasilya Sep 15 '21 at 17:31

score 160 · Accepted Answer · answered Feb 07 '14 at 23:11

160

You've got a stray byte floating around. You can find it by running

with open("x.py") as fp:
    for i, line in enumerate(fp):
        if "\xe2" in line:
            print i, repr(line)

where you should replace "x.py" by the name of your program. You'll see the line number and the offending line(s). For example, after inserting that byte arbitrarily, I got:

4 "\xe2        lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])\n"

answered Feb 07 '14 at 23:11

DSM

342,061
65
592
494

2

Thanks this helped a lot! Still not sure what the charter is/was. I ended up throwing the code in notepad and saving as ASCII, and then pasting. – KDecker Feb 07 '14 at 23:32
I faced this as well, which I think was due to some copy paste issue, where the character showed up as whitespace in the editor(vim). – Samveen Nov 05 '14 at 04:54
It might be needed to have python know that such characters are there for a reason and not just a stray byte. See the solution in Chris Redford's answer. – simplyharsh Nov 14 '14 at 14:13
10

I had the same problem, character \xe2 was part of an hyphen "–" (\xe2\x80\x93), but slightly longer than the ascii "-". That's because I pasted text into vim, but didn't pay attention to this longer hyphen. For the full story, I produced this character with a double-hyphen "--" in a wiki text (using textile) – PlasmaBinturong Mar 16 '16 at 13:27
3

Mine was in an apostrophe - as in `O'Donnell` – user2490003 Jan 24 '17 at 22:36
Mine was a stylized double quote (“) as opposed to the simple double quote ("). – davneetnarang Nov 20 '17 at 05:42
yeah ... mine too was bloody pasting. But i was glad that npp has convert to "encoding" option. Saved me the trouble for searching for individual bytes. Tnx again :D you are a life saver! :D – Danilo Aug 04 '18 at 07:18
Great fix. I had the exact same issue as @PlasmaBinturong, where I got the extended hypen from pasting into Vim. – Christopher Hunter May 28 '19 at 20:14
2

For folks hunting down the problem character, [this table is useful](https://www.i18nqa.com/debug/utf8-debug.html). Everything in the "UTF-8 Bytes" column which starts with `%E2` is a candidate. Generally the issue is editing code with "smart" features turned on like "smart quotes" replacing `"` with `“` (U+201C) and `”` (U+201D) or turning `--` into `—` (U+2014 em dash). All of those start with "\xe2\x80" in UTF-8. – Schwern Feb 12 '20 at 22:01

score 43 · Answer 3 · answered Apr 27 '17 at 18:02

43

Or you could just simply use:

# coding: utf-8

at top of .py file

answered Apr 27 '17 at 18:02

Ysh

803
1
9
13

score 32 · Answer 4 · answered Feb 21 '17 at 11:41

32

\xe2 is the '-' character, it appears in some copy and paste it uses a different equal looking '-' that causes encoding errors. Replace the '-'(from copy paste) with the correct '-' (from you keyboard button).

answered Feb 21 '17 at 11:41

André Liu

441
5
8

4

thanks a lot ! in my case it was the " ' " character – pietà Feb 08 '18 at 10:35
0xE2 is not a hyphen in any common encoding. It is part of the UTF-8 encoding of many common characters like non-ASCII hyphens and quotes, though. – tripleee Feb 02 '21 at 05:55

score 24 · Answer 5 · answered Jan 21 '15 at 09:04

24

Change the file character encoding,

put below line to top of your code always

# -*- coding: utf-8 -*-

answered Jan 21 '15 at 09:04

Dadaso Zanzane

6,039
1
25
25

score 12 · Answer 6 · answered May 19 '15 at 23:23

12

I had the same error while copying and pasting a comment from the web

For me it was a single quote (') in the word

I just erased it and re-typed it.

answered May 19 '15 at 23:23

khalid sookia

317
2
13

I had the same error, but while testing locally it didn't break and worked. But when ran on server it gave that encoding error. Had to replace the comments single quote to utf-8 version. – shivgre Feb 04 '19 at 09:04

Bhupinder Yadav · Answer 7 · 2018-10-20T21:55:33.570

10

Adding # coding=utf-8 line in first line of your .py file will fix the problem.

Please read more about the problem and its fix on below link, in this article problem and its solution is beautifully described : https://www.python.org/dev/peps/pep-0263/

edited Oct 20 '18 at 21:55

answered Oct 20 '18 at 21:36

Bhupinder Yadav

346
3
8

score 5 · Answer 8 · answered Oct 17 '14 at 23:00

5

I got this error for characters in my comments (from copying/pasting content from the web into my editor for note-taking purposes).

To resolve in Text Wrangler:

Highlight the text
Go the the Text menu
Select "Convert to ASCII"

answered Oct 17 '14 at 23:00

Kat Russo

469
4
8

2

Option has been changed to text->zap gremlins, in later versions of TextWrangler but it worked for me :-) – TheMethod Feb 11 '15 at 14:03

score 4 · Answer 9 · answered Feb 24 '16 at 23:09

Based on PEP 0263 -- Defining Python Source Code Encodings

Python will default to ASCII as standard encoding if no other
encoding hints are given.

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

      # coding=<encoding name>

or (using formats recognized by popular editors)

      #!/usr/bin/python
      # -*- coding: <encoding name> -*-

or

      #!/usr/bin/python
      # vim: set fileencoding=<encoding name> :

yet it worked from the first comment to the question, the answer contains the explanation. thanks — WebComer, Aug 22 '16 at 21:19

score 4 · Answer 10 · edited Dec 22 '18 at 13:48

4

I had the same issue and just added this to the top of my file (in Python 3 I didn't have the problem but do in Python 2

#!/usr/local/bin/python
# coding: latin-1

edited Dec 22 '18 at 13:48

Suraj Rao

29,388
11
94
103

answered Dec 22 '18 at 13:38

Paul Z

53
6

This is going to be horribly wrong if your source is not *actually* Latin-1. You need to figure out the correct encoding, and then add that to the `coding:` spec. – tripleee Feb 02 '21 at 05:58

score 3 · Answer 11 · answered Mar 17 '18 at 15:11

3

If it helps anybody, for me that happened because I was trying to run a Django implementation in python 3.4 with my python 2.7 command

answered Mar 17 '18 at 15:11

aless80

3,122
3
34
53

1

Wasn't using Django, but this still helped me. I wrote the script using python 3 and tried running it with python 2. Error went away when I ran it with the right version. Thanks! – JustBlossom Feb 03 '19 at 18:50

score 3 · Answer 12 · answered Jul 22 '20 at 10:04

I my case \xe2 was a ’ which should be replaced by '.

In general I recommend to convert UTF-8 to ASCII using e.g. https://onlineasciitools.com/convert-utf8-to-ascii

However if you want to keep UTF-8 you can use

#-*- mode: python -*-
# -*- coding: utf-8 -*-

score 2 · Answer 13 · edited Sep 29 '17 at 10:11

2

After about a half hour of looking through stack overflow, It dawned on me that if the use of a single quote " ' " in a comment will through the error:

SyntaxError: Non-ASCII character '\xe2' in file

After looking at the traceback i was able to locate the single quote used in my comment.

edited Sep 29 '17 at 10:11

cs95

379,657
97
704
746

answered Oct 30 '16 at 15:18

Mark Austin

21
1

score 1 · Answer 14 · edited Dec 27 '14 at 21:48

1

I had this exact issue running the simple .py code below:

import sys
print 'version is:', sys.version

DSM's code above provided the following:

1 'print \xe2\x80\x98version is\xe2\x80\x99, sys.version'

So the issue was that my text editor used SMART QUOTES, as John Y suggested. After changing the text editor settings and re-opening/saving the file, it works just fine.

edited Dec 27 '14 at 21:48

Mathias Müller

22,203
13
58
75

answered Dec 27 '14 at 21:29

nagrom

11
1

Chris · Answer 15 · 2016-06-18T04:49:23.960

I am trying to parse that weird windows apostraphe and after trying several things here is the code snippet that works.

def convert_freaking_apostrophe(self,string):

   try:
      issuer_rename = string.decode('windows-1252')
   except:
      issuer_rename = string.decode('latin-1')
   issuer_rename = issuer_rename.replace(u'’', u"'")
   issuer_rename = issuer_rename.encode('ascii','ignore')
   try:
      os.rename(directory+"/"+issuer,directory+"/"+issuer_rename)
      print "Successfully renamed "+issuer+" to "+issuer_rename
      return issuer_rename
   except:
      pass

#HANDLING FOR FUNKY APOSTRAPHE
if re.search(r"([\x90-\xff])", issuer):
   issuer = self.convert_freaking_apostrophe(issuer)

samsri · Answer 16 · 2020-11-19T05:53:33.300

1

I fixed this using pycharm. At the bottom of pycharm you can see file encoding. I noticed that it is UT-8. I changed it to US-ASCII

edited Nov 19 '20 at 05:53

answered Nov 19 '20 at 05:46

samsri

1,104
14
25

score 0 · Answer 17 · answered Jan 10 '18 at 11:34

I had the same issue but it was because I copied and pasted the string as it is. Later when I manually typed the string as it is the error vanished.

I had the error due to the - sign. When I replaced it with manually inputting a - the error was solved.

Copied string 10 + 3 * 5/(16 − 4)

Manually typed string 10 + 3 * 5/(16 - 4)

you can clearly see there is a bit of difference between both the hyphens.

I think it's because of the different formatting used by different OS or maybe just different software.

Probably you copy/pasted from some blog or similar whose software surreptitiously replaces hyphens and various quoting characters with "typographically pleasing" but incompatible glyphs. — tripleee, Feb 02 '21 at 05:56

score 0 · Answer 18 · answered Apr 12 '18 at 14:41

0

For me the problem had caused due to "’" that symbol in the quotes. As i had copied the code from a pdf file it caused that error. I just replaced "’" by this "'".

answered Apr 12 '18 at 14:41

Vineet Bramhankar

1

score 0 · Answer 19 · answered Jul 21 '18 at 09:54

If you want to spot what character caused this just assign the problematic variable to a string and print it in a iPython console.

In my case

In [1]: array = [[24.9, 50.5], [11.2, 51.0]]        # Raises an error

In [2]: string = "[[24.9, 50.5], [11.2, 51.0]]"     # Manually paste the above array here

In [3]: string
Out [3]: '[[24.9, 50.5]\xe2\x80\x8b, [11.2, 51.0]]' # Here they are!

score 0 · Answer 20 · answered Aug 20 '18 at 06:10

for me, the problem was caused by typing my code into Mac Notes and then copied it from Mac Notes and pasted into my vim session to create my file. This made my single quotes the curved type. to fix it I opened my file in vim and replaced all my curved single quotes with the straight kind, just by removing and retyping the same character. It was Mac Notes that made the same key stroke produce the curved single quote.

score 0 · Answer 21 · answered May 15 '20 at 05:47

I was unable to find what's the issue for long but later I realised that I had copied a line "UTC-12:00" from web and the hyphen/dash in this was causing the problem. I just wrote this "-" again and the problem got resolved.

So, sometimes the copy pasted lines also give errors. In such cases, just re-write the copy pasted code and it works. On re-writing, it would look like nothing got changed but the error will be gone.

TheDudeAbides · Answer 22 · 2020-09-11T15:03:00.187

Plenty of good solutions here.

One challenge not really addressed in any of them is how to visually identify certain hard-to-spot non-ASCII characters that resemble other plain ASCII ones. For example, en dashes can appear almost exactly like hyphens and curly quotes look a lot like straight quotes, depending on your text editor's font.

This one-liner, which should work on Mac or Linux, will strip characters not in the ASCII printable range and show you the differences side-by-side:

# assumes Bash shell; for Bourne shell (sh), rearrange as a pipe and
# give '-' as second argument to 'sdiff' instead
sdiff --suppress-common-lines script.py <(tr -cd '\11\12\15\40-\176' <script.py)

The characters \11, \12, and \15 are tab, newline, and carriage return, respectively, in octal; the remaining range is the visible ASCII characters. (hat tip)

Another tip gleaned from this SO thread uses an inverse character class consisting of anything not in the ASCII visible range, and highlights it:

grep --color '[^ -~]' script.py

This should also work fine with the macOS / BSD version of grep.

score -3 · Answer 23 · answered Feb 07 '14 at 23:02

-3

When I have a similar issue when reading text files i use...

f = open('file','rt', errors='ignore')

answered Feb 07 '14 at 23:02

Cam

478
1
4
13

This is terrible advice. You should figure out the correct encoding; discarding characters you don't recognize simply corrupts the data. The question asks about the encoding in Python source code, not in input text files, anyway. – tripleee Feb 02 '21 at 05:52

Python "SyntaxError: Non-ASCII character '\xe2' in file"

23 Answers23

Linked