0

I found the text of certain cell entries in my dataframe to be garbled and would like to replace them with string, but R returns the following message

#load data from dropbox
library(foreign)
data <- read.csv("https://www.dropbox.com/s/anm8xrovxc5xtr5/comtrade2009.csv?dl=1")
unique(data$ptTitle)[75]
[1] <NA>
#this is not an NA because the text on the CSV file appears to be some garbled string due to encoding, 
#it shows "C<U+00F4>te d'Ivoire"

data$ptTitle[data$ptTitle == <NA>] <- "Cote d'Ivoire"
Warning message:
In `[<-.factor`(`*tmp*`, ct2009$ptTitle == "<NA>", value = c(238L,  :
  invalid factor level, NA generated

it does not allow me to replace those garbled character values with character string, does anyone know how to overwrite those garbled characters with my preferred character string?

Update

So I guess a better way to work around this is to add stringsAsFactors=F when loading csv file using read.csv, so it's much easier to replace cell values with NA (instead of <NA>). Sorry for all the hassles this thread might have caused.

Chris T.
  • 1,699
  • 7
  • 23
  • 45
  • 1
    To capture NA in R, `is.na(data$ptTitle)` – Sotos Nov 09 '18 at 12:33
  • `` means it's a factor, and the thread marked by user zx8754 did not actually answer my question. – Chris T. Nov 09 '18 at 12:50
  • I guess a better alternative to work around this is to add `stringsAsFactors=F` when loading the data using `read.csv`, so it's easier to replace those `NA` (instead of ``). – Chris T. Nov 09 '18 at 12:55
  • There is one NA row in data, row 94. Linked post solution works. Try: `data$ptTitle <- addNA(data$ptTitle)`, this will add NA as a new factor level, then we can change to something else, again from the linked post: `levels(data$ptTitle) <- c(head(levels(data$ptTitle), -1), "Cote d'Ivoire")` – zx8754 Nov 09 '18 at 13:12
  • The strange thing is that after all these replacement tricks, that cell still maintains its garbled format `Cte d'Ivoire`. – Chris T. Nov 09 '18 at 13:20
  • Some updates here, after I applied your code `data$ptTitle <- addNA(data$ptTitle)` and used `data$ptTitle[data$ptTitle == "Cte d'Ivoire"] <- "Cote d'Ivoire"` again, it does work, but this line of code `levels(data$ptTitle) <- c(head(levels(data$ptTitle), -1), "Cote d'Ivoire")` doesn't do the trick, at least not here. Could you paste your proposed solution here as a comment? So I can mark this as solution, which might help others. – Chris T. Nov 09 '18 at 13:38

0 Answers0