-1

I am working in text mining with spanish twitts, my problem is that i have the same words but in differents ways (with accent and without accent), example: accion, acción.

I tried to use coding: unicode "UTF-8", but dont work. my library library(stringi) library(twitteR) library(tm) library(wordcloud) library(RColorBrewer)

Rodrigo_BC
  • 161
  • 11
  • 1
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Apr 29 '16 at 06:52
  • What you could do is create a "data base" of accented characters and what they translate to. Then apply this to individual tweet and "flush" out accented characters. You can for example use `sub`. – Roman Luštrik Apr 29 '16 at 07:39

1 Answers1

0

You did not specify clearly what you are trying to do with accessed tweets ( saving in a text file, or as a dataframe etc.) If you are using UTF-8 encoding it will basically preserve the letter as it is.

 con <- file("C:/Dir1/sub_dir1/output/output.txt", encoding = "UTF-8")
 write(df, file = con)

However, if you are trying to change this accent characters into normal equivalent The simplest way would be using iconv

iconv( "acción", to='ASCII//TRANSLIT')
>[1] "accion"  
user5249203
  • 4,436
  • 1
  • 19
  • 45