How can I read the "–" character?

Question

I am working with Pycharm and I get my data from a separate file. This data contains this character: '–', that looks like a hyphen but apparently isn't.

This isn't an issue as long as I copy the data directly as a string, but if I read it from a file then '–' gets replaced by 'â€“'

Here is a minimal example:

with open('data.html', 'r') as file:
    data = file.read()
print(data)

where data.html is:

example–example

prints:

exampleâ€“example

I get the same encoding issue when I open data.html with Firefox. What can I do so that this character is correctly read from the file?

score 2 · Answer 1 · answered May 02 '21 at 18:27

2

Try to add

encoding="utf-8"

in your open(): open('data.html', 'r', encoding="utf-8")

reference: Hyphen changing to special character â€“

answered May 02 '21 at 18:27

Z Li

4,133
1
4
19

score 0 · Answer 2 · answered May 02 '21 at 18:32

0

try to write code like this

with open('data.html', 'r', encoding='utf-8') as file:
     data = file.read()
print(data)

answered May 02 '21 at 18:32

Ahmed Alhameli

63
7

How can I read the "–" character?

2 Answers2