I have converted a .doc document to .txt, and I have some weird formatting that I cannot remove (from looking at other posts, I think it is in Hex code, but I'm not sure).
My data set is a data frame with two columns, one identifying a speaker and the second column identifying the comments. Some strings now have weird characters. For instance, one string originally said (minus the quotes):
"Why don't we start with a basic overview?"
But when I read it in R after converting it to a .txt, it now reads:
"Why don<92>t we start with a basic overview?"
I've tried:
df$comments <- gsub("<92>", "", df$comments)
However, this doesn't change anything. Furthermore, whenever I do any other substitutions within a cell (for instance, changing "start" to "begin", it changes that special character into a series of weird ? that're surrounded in boxes.
Any help would be very appreciated!
EDIT: I read my text in like this:
df <- read_delim("file.txt", "\n", escape_double = F, col_names = F, trim_ws = T)
It has 2 columns; the first is speaker and the second is comments.