I am having a very hard time with accented characters in a stata file I have to import into R. I solved one problem over here, but there's another problem.
After import, anytime I use the lookfor
command in the labelled package I get this error.
remotes::install_github("sjkiss/cesdata")
library(cesdata)
data("ces19web")
library(labelled)
look_for(ces19web, "vote")
invalid multibyte string at '<e9>bec Solidaire'
Now I can find one value label that has that label, but it actually appears properly, so I don't know what is going on.
val_labels(ces19web$pes19_provvote)
But, there are other problematic value labels that cause other problems. For example, the value labels for the 13th variable cause this problem.
# This works fine
ces19web %>%
select(1:12) %>%
look_for(., "[a-z]")
# This chokes
ces19web %>%
select(1:13) %>%
look_for(., "[a-z]")
# See the accented character
val_labels(ces19web[,13])
I have come up with this way of replacing the accented characters of the second type.
names(val_labels(ces19web$cps19_imp_iss_party))<-iconv(names(val_labels(ces19web$cps19_imp_iss_party)), from="latin1", to="UTF-8")
And this even solves the problem for look_for()
#This now works!
ces19web %>%
select(1:13) %>%
look_for(., "[a-z]")
But what I need is a way to loop through all of the names of all of the the value labels and make this conversion for all the bungled accented characters.
This is so close, but I don't a know how to save the results of this as the new names for the value labels
ces19web %>%
#map onto all the variables and get the value labels
map(., val_labels) %>%
#map onto each set of value labels
map(., ~{
#Skip if there are no value labels
if (!is.null(.x)){
#If not convert the names as above
names(.x)<-iconv(names(.x), from="latin1", to="UTF-8")
}
}) ->out
#Compare the 16th variable's value labels in the original
ces19web[,16]
#With the 16th set of value labels after the conversion function above
out[[16]]
But how do I make that conversion actually stick in the original dataset
Thank you!