I am having an issue with text encoding that I cannot solve.
I have a string in an excel file that I'm reading into R that looks like: Productâ„¢
. With a bit of research, I learned that the â„¢
is UTF-8 that has been read incorrectly as CP-1252.
The UTF-8 hex code for ™ is 0xe2 0x84 0xa2. This has been read as CP-1525: â (E2) „ (84) ¢ (A2).
How can I fix this issue? I have tried using:
iconv("Productâ„¢", "cp1252", "utf-8")
#> [1] "Productâ„¢"
But as you can see, the output is incorrect. The desired output is Product™
.
Any ideas about how to fix this issue? The incorrect data is in an Excel spreadsheet, but I am trying to clean the text in R. A solution to fix the original data or a data cleaning solution in R would be great.