0

I'm trying to build a word corpus based on my data frame, which was loaded from a JSON file. While doing it R doesn't see special signs like 'ř' (in the original json data it is visible and encoding is utf-8). I tried encoding in R with source editor and Encoding(x), but none of them works. I would like to change the signs to latin letters. e.g. ř --> r, but r using gsub function completely destroys my data frame. Do you have any ideas how to solve it?

#JSON file contains name with "ř", after loading data I get <f8> even though I choose encoding of source file
data5 <- fromJSON(file = "Test1801.json")
data6 <- as.data.frame(data5)
data6 <- tolower(data6)   #This and gsub change whole data frame to character values "1"
data6 <- gsub("ř", "r", data6)
Iga
  • 1
  • 1
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What OS are you using? How have you verified the encoding of the file? `gsub()` is meant to be used on a character vector, not a data.frame. Perhaps you should just be applying the function to a subset of columns? It's really hard to say since we have no idea what your data really looks like. – MrFlick Sep 19 '18 at 15:12
  • `fromJSON` has an `encoding` argument, and you can pass whatever the encoding of your file is there, i.e. `encoding = 'UTF-8'` – Mako212 Sep 19 '18 at 15:38

1 Answers1

0

Welcome to SO. Please have in mind that you are expected to provide a reproducible example so we can work on your problem.

I understand you're looking after a way to change the symbols to latin letters. That can be accomplished with stringi::stri_trans_general:

require(stringi) # load library

a <- "ř" # assign your weird character to variable

newA <- stri_trans_general(a, "latin-ascii") # convert to latin 

newA
> "r"

If you find this answer helpful, please consider marking it as such by ticking on the mark below the voting.

PavoDive
  • 6,322
  • 2
  • 29
  • 55