How to replace unicode character?

Question

I have a file(input.txt) with below content:

é

and I am running the below commands and failing to replace unicode character with character "a"

Attempt 1: Prints blank.

>>> file = open("input.txt","r")
>>> print file.read().replace(u"\u00E9","a")

Attempt 2: Prints blank.

>>> file = open("input.txt","r")
>>> print file.read().decode("utf-8").replace(u"\u00E9","a").encode("utf-8")

Note: I have gone through this question and the answer(Attempt 2) suggested there is not working, not sure why.

EDIT:

As pointed in the comments by ShadowRanger, My question was incomplete. My apologies for that.

Here is the complete code for Attempt 1:

>>> file = open("input.txt","r")
>>> print file.read()
>>> é
>>> print file.read().replace(u"\u00E9","a")
>>>

Here is the complete code for Attempt 2:

>>> file = open("input.txt","r")
>>> print file.read()
>>> é
>>> print file.read().decode("utf-8").replace(u"\u00E9","a").encode("utf-8")
>>>

This shouldn't output nothing unless the file contains nothing (or contains backspace/carriage return characters that undo the output as it goes). Try wrapping the whole shebang (aside from `print` itself) in `repr()`; that should guaranteed output, with escapes to prevent any terminal weirdness from messing you up, so you can confirm if there *should* be output. — ShadowRanger, Oct 26 '18 at 15:25
@ShadowRanger I got it working. I changed file read mode to binary and then Attempt 2 worked. `file = open("input.txt","rb")` — javanoob, Oct 26 '18 at 15:41
For Python 2 code, it shouldn't matter whether it's binary or text mode (unless you are using `io.open` without telling us), as there is no difference at all on most non-Windows machines, and on Windows, it only affects line endings, where the lack of line ending translation in binary mode would *cause* problems. Even if binary mode made a difference (e.g. you're using `io.open` and not telling us), it should raise an exception if it's a problem, not silently print nothing. You should really provide a [MCVE], otherwise, we're just guessing at what else might be changing your behavior. — ShadowRanger, Oct 26 '18 at 18:22
@ShadowRanger Thank you for your time and my apologies for incomplete question. Updated question now with full details. Thanks again! — javanoob, Oct 26 '18 at 19:06
With the extra details, your problem was a clear duplicate of [Why can't I call read() twice on an open file?](https://stackoverflow.com/q/3906137/364696). You can't call `read()` with no arguments twice in a row and expect to get the file contents twice; you either need to call it once and store the result for reuse, or call `.seek(0)` on the file object in between `read` calls to reset the file position. — ShadowRanger, Oct 26 '18 at 20:11
@ShadowRanger Thanks for your time. I understood what was happening and resolved it. Thanks again! — javanoob, Oct 26 '18 at 20:12

food4mybrain · Answer 1 · 2018-10-26T18:45:17.990

3

You are opening the file in read-only mode. You won't be able to modify the contents of the file if that's what you are trying to achieve.

If you are just trying to manipulate the string read from the file, then I'd suggest you specify the file encoding in order to seamlessly manipulate unicode characters within your program.

Something like this:

PYTHON 2

# -*- coding: utf-8 -*-

from __future__ import unicode_literals
import io

with io.open("input.txt", mode="r", encoding="utf-8"):
    c = file.read()
    c = c.replace("é", "a")
    print c

PYTHON 3

import io

with io.open("input.txt", mode="r", encoding="utf-8") as file:
    c = file.read()
    c = c.replace("é", "a")
    print(c)

edited Oct 26 '18 at 18:45

answered Oct 26 '18 at 15:45

food4mybrain

31
4

You'd need a `u` prefix on `"é"`, and probably need a source code encoding declaration for this to work on Python 2 (along with using `io.open` instead of `open`, which you've done correctly). The OP is clearly on Python 2 based on the unparenthesized `print`s. – ShadowRanger Oct 26 '18 at 18:26
Good point @ShadowRanger, I missed the unparenthesized `print` statement. I'll update my answer accordingly. I kinda rushed into answering this one. – food4mybrain Oct 26 '18 at 18:40

How to replace unicode character?

1 Answers1