0

I am having a csv file which contains columns having numbers and some strings but there are few rows having characters like this ->" ã¥ûzF¤i ãAz¿AMtgï¿ÆµÄwüÒ©ç¥ûµ½ ã" When i open this via tables or notepad++ with conversion to utc-8-bom all the above characters are converted to japenese. How can we achieve this via python code. As of now i am trying to open file and read it like this. Is there different way to encode or decode these characters? with open(file_path, 'r') as output_file: for line in output_file: print(line)

Vishruth
  • 1
  • 4
  • `open(file_path, "r", encoding="utf-8-sig")` – tripleee Jul 12 '22 at 08:29
  • It is giving this @tripleee UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8f in position 8: invalid start byte – Vishruth Jul 12 '22 at 09:05
  • Then Notepad is lying when it says it's UTF-8 with a BOM. Try `encoding="utf-8"` (probably won't work either) and/or see my answer to the second duplicate for more in-depth troubleshooting background. In the worst case, you have a file with mixed encodings. See also the [Stack Overflow `character-encoding` tag info page](http://stackoverflow.com/tags/character-encoding/info) – tripleee Jul 12 '22 at 09:17

0 Answers0