4

I am using Python to read a text file of data line by line. One of the lines contains a degree symbol. I want to alter this part of the string. My script uses line = line.replace("TEMP [°C]", "TempC"). My code stops at this line but does not change the sting at all nor does it throw an error. Clearly there is something about my replace such that the script does not see the 'TEMP [°C]' as existing in my string.

In order to insert the degree sign in my script I had to change the encoding to UTF-8 in my IDE file settings. I have included the following text at the top of my script.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

How do I replace 'TEMP [°C]' with 'TempC'?

I am using Windows 7 and Python 2.7 with Komodo IDE 5.2

I have tried running the suggested code in a Python Shell in Komodo and that changed the file.

# -*- coding: utf-8 -*-
line = "hello TEMP [°C]"
line = line.replace("TEMP [°C]", "TempC")
print(line)
hello TempC

This suggested code in a Python Shell in Komodo returned this.

line = "TEMP [°C]"
line = line.replace(u"TEMP [°C]", "TempC")
Traceback (most recent call last):
File "<console>", line 0, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 6: ordinal not in range(128)

None of these suggestions worked when reading my text file though.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
GBG
  • 233
  • 5
  • 17
  • Your code works just fine for me in Python 2.7 interactive mode. – Prune Mar 01 '19 at 00:02
  • Are you opening the file with plain `open`, or using `io.open` to properly/automatically decode to `unicode`? And what is the encoding of the file you're reading from? If you're using plain `open`, reading from a non-UTF-8 file, you're going to get a different `str` than the one you got here (`"TEMP [°C]"` is actually `'TEMP [\xc2\xb0C]'`, but if the file you're reading from is `latin-1`, you'd have read in `'TEMP [\xb0C]'` (note the lack of the `\xc2`, which the `utf-8` representation requires). – ShadowRanger Mar 01 '19 at 00:17
  • @GBG: The edit just suggests, even more strongly, that the file's encoding is not UTF-8. Is the Windows or a UNIX-alike? If the latter, try running `file NAMEOFYOURINPUTFILE` at the command line; I'm going to guess it tells you something like `NAMEOFYOURINPUTFILE: ISO-8859 text`, not utf-8 text. – ShadowRanger Mar 01 '19 at 00:28
  • @ShadowRanger. I used the link below to determine the file I am reading uses ANSI encoding. I have tried adding import io and opening the file with io.open but the string does not change.https://stackoverflow.com/questions/3710374/get-encoding-of-a-file-in-windows – GBG Mar 01 '19 at 00:28

3 Answers3

6

Based on your symptoms, your Python str literals end up as their utf-8 encodings, so when you type:

"TEMP [°C]"

you actually get:

'TEMP [\xc2\xb0C]'

Your file is some other encoding (e.g. latin-1 or cp1252), and since you're reading it via plain open, you're getting back undecoded str. But in latin-1 and cp1252 encoding, the str is 'TEMP [\xb0C]' (note lack of \xc2), so str comparison doesn't consider the two strings equivalent.

The best fix is to replace your use of open with io.open, which uses the Python 3 version of open that can seamlessly decode using a given encoding to produce canonical unicode representations, and similarly, to use unicode literals instead of str in (to Python) unknown encoding, so there is no disagreement on the correct way to represent a degree symbol (in unicode, there is one, and only one, representation):

import io

with io.open('myfile.txt', encoding='cp1252') as f:
    for line in f:
        line = line.replace(u"TEMP [°C]", u"TempC")

As you describe in your edits, your file is likely cp1252 (your editor says it's ANSI, which is just a dumb way to describe cp1252), thus the chosen encoding.

Note: If you're going to use unicode consistently throughout your program (a decent idea if you deal with non-ASCII data), you can make that the default:

from __future__ import unicode_literals
# All string literals are unicode literals unless prefixed with b, as on Python 2

from io import open  # open is now Python 3's open

# No need to qualify with `io.` for `open`, nor put `u` in front of Unicode text
with open('myfile.txt', encoding='cp1252') as f:
    for line in f:
        line = line.replace("TEMP [°C]", "TempC")

Really you should just move to Python 3, where this whole "unicode and str try to work together and often fail" thing was resolved by splitting the two types completely.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 2
    @GBG: Glad I could help. If I ever get my time machine working, I'm going to go back to 1980 and force everyone to switch to UTF-8 as the one true text encoding from the start, so we aren't stuck dealing with Windows and it's locale-specific ASCII-superset one-byte-per-character encodings that do nothing but cause you pain the moment you need a single non-ASCII thing in your program. – ShadowRanger Mar 01 '19 at 00:45
2

You should use the u flag for a unicode string literal:

line = line.replace(u"TEMP [°C]", "TempC")
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • @mrk - I have tried both approaches and neither worked. I am at a loss to understand why these do not work. – GBG Mar 01 '19 at 00:14
1

This code is working fine for me (Python 2.7.14). Maybe you can point out whether you did something different, so we can take it from there.

# -*- coding: utf-8 -*-

line = "hello TEMP [°C]"
line = line.replace("TEMP [°C]", "TempC")

print(line)
# hello TempC

Note: For me no u flag was necessary.

mrk
  • 8,059
  • 3
  • 56
  • 78