I'm importing text from pdf files using pdf_text()
. The import picks up some unicode, but I can only see it using the str()
function, but not print()
.
For example, print(x)
displays:
"CTO area performance..."
str(x)
displays:
"<(u)+F0B7> CTO area performance..."
(note (u)+F0B7 is really U+F0B7 above)
How can I access the unicode "\\<U+F0B7>"
using gsub()
? Since it does not seem be in the text, I'm having trouble replacing it with a dash "-". I tried: x <- gsub("<U\\+[0-9A-Z]{4}>", "-", x)
but it didn't work.