0

I am working with Pycharm and I get my data from a separate file. This data contains this character: '–', that looks like a hyphen but apparently isn't.

This isn't an issue as long as I copy the data directly as a string, but if I read it from a file then '–' gets replaced by '–'

Here is a minimal example:

with open('data.html', 'r') as file:
    data = file.read()
print(data)

where data.html is:

example–example

prints:

example–example

I get the same encoding issue when I open data.html with Firefox. What can I do so that this character is correctly read from the file?

Anne Aunyme
  • 506
  • 4
  • 14

2 Answers2

2

Try to add

encoding="utf-8"

in your open(): open('data.html', 'r', encoding="utf-8")

reference: Hyphen changing to special character –

Z Li
  • 4,133
  • 1
  • 4
  • 19
0

try to write code like this

with open('data.html', 'r', encoding='utf-8') as file:
     data = file.read()
print(data)