Cleaning accent in text twitter

Question

I am working in text mining with spanish twitts, my problem is that i have the same words but in differents ways (with accent and without accent), example: accion, acción.

I tried to use coding: unicode "UTF-8", but dont work. my library library(stringi) library(twitteR) library(tm) library(wordcloud) library(RColorBrewer)

Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, Apr 29 '16 at 06:52
What you could do is create a "data base" of accented characters and what they translate to. Then apply this to individual tweet and "flush" out accented characters. You can for example use `sub`. — Roman Luštrik, Apr 29 '16 at 07:39

score 0 · Accepted Answer · answered May 20 '16 at 16:27

You did not specify clearly what you are trying to do with accessed tweets ( saving in a text file, or as a dataframe etc.) If you are using UTF-8 encoding it will basically preserve the letter as it is.

 con <- file("C:/Dir1/sub_dir1/output/output.txt", encoding = "UTF-8")
 write(df, file = con)

However, if you are trying to change this accent characters into normal equivalent The simplest way would be using iconv

iconv( "acción", to='ASCII//TRANSLIT')
>[1] "accion"

Cleaning accent in text twitter

1 Answers1