I'm trying to make an emoji analysis on R. I have stored some tweets where there are emojis.
Here is one of the tweet that I want to analyze :
> tweetn2
[1] "Programme du week-end: \xed\xa0\xbd\xed\xb2\x83\xed\xa0\xbc \xed\xbe\xb6\xed\xa0\xbc
\xed\xbd\xbb\xed\xa0\xbc\xed\xbd\xbb\xed\xa0\xbc \xed\xbd\xbb\xed\xa0\xbc\xed\xbd\xbb"
To be sure that I have "UTF-8":
> Encoding(tweetn2)
[1] "UTF-8
" Now when I'm trying to recognize some characters, it's not working fine
> grepl("\\xed",tweetn2)
[1] FALSE
or
> grepl("xed",tweetn2)
[1] FALSE
But it seems that emojis "\xed\xa0\xbd" are not "UTF-8" encoding because I get an error message when I write :
> str(tweetn2)
Error in str.default(tweetn2) : invalid multibyte string, element 1
I find a kind of solution by using iconv( ) function and "ASCII" encoding there :
http://www.r-bloggers.com/emoticons-decoder-for-social-media-sentiment-analysis-in-r/
But I want to keep using "UTF-8" for my analysis because it works well with french special letters (à, é, è, ê, ë, û, etc.. )
So do you have an idea how I can get above it?
Thanks