I want to use R to perform some analytics on Twitter posts, such as this Tweet by Donald Trump (pulled via the Twitter API):
"Join me LIVE in South Korea\U0001f1fa\U0001f1f8\U0001f1f0\U0001f1f7\n#NationalAssembly #POTUSinAsia"
First I would like to know if these is a regular expression that I can use to select the escaped unicode (e.g.: \U0001f1f8
).
Expressions that I would assume would work, such as this: \\[[:alnum:]]{9}
do not work. I got an interesting error message, however:
Error in grepl("\[[:alnum:]]{9}", x, perl = T) : invalid regular expression '[[:alnum:]]{9}' In addition: Warning message: In grepl("\[[:alnum:]]{9}", x, perl = T) : PCRE pattern compilation error 'POSIX named classes are supported only within a class' at '[:alnum:]]{9}'
Also, I'd like to know if there is a way I can convert these escaped unicode back into the characters they are supposed to represent so I can display them to the user on the front-end of the application.