0

I'm using RStudio (3.3.1) on macOS Sierra (10.12.5) to scrape the user profile information of the followers of certain Twitter users.

My problem is that when a user's profile description is in Arabic the text that is returned is garbled. For example,this user description:

جزائريٌّ يسري دمُ الشهداء في عروقِه ويطلُب العِلم حتّى يعلم الذين كفروا أنّ دين الإسلام هو دينُ الحق ،والحقَّ أقُول..

becomes this:

جزائريٌّ يسري دم٠الشهداء ÙÙŠ عروقÙÙ‡ ÙˆÙŠØ·Ù„ÙØ¨ العÙلم حتّى يعلم الذين ÙƒÙØ±ÙˆØ§ أنّ دين الإسلام هو دين٠الحق ،والحقَّ أقÙول.. #أنشط على ØµÙØ­Ø©

This is particularly problematic as the project I am working on is focused on Muslim users of Twitter and a lot of the data is in Arabic.

I'm guessing this is a problem with the encoding and this answer to a similar question suggested updating to RStudio 3.3.3 but when I tried that it made no difference and I had issues with the compatibility of some of my packages.

Any help would be appreciated.

KJGarbutt
  • 161
  • 8

1 Answers1

0

I was able to make this Twitter word cloud with Arabic text, though I can't read Arabic, so who knows how successful it was.Word cloud That's based in part on:

# Get some tweets
trump_tweets <- userTimeline("RTarabic", n = 1000)

# Extract the text
trump_text <- iconv(trump_text, 'UTF-8', 'ASCII')
trump_text <- sapply(trump_tweets, function(x) x$getText())

I think making it into UTF-8 is the key, but I admit I barely know what I'm talking about here.