I have a bunch of author names from foreign countries in a CSV which R reads in just fine. I'm trying to clean them for upload to Mechanical Turk (which really doesn't like even a single internationalized character). In so doing, I have a question (to be posted later), but I can't even dput
them in a sensible way:
> dput(df[306,"primauthfirstname"])
"Gwena\xeblle M"
> test <- "Gwena\xeblle M"
<simpleError in nchar(val): invalid multibyte string 1>
In other words, dput
works just fine, but pasting the result in fails. Why doesn't dput
output the necessary information to allow copy/pasting back into R (presumably all it needs to do is add the encoding attributes the a structure statement?). How do I get it to do so?
Note that \xeb
is a valid character as far as R is concerned:
> gsub("\xeb","", turk.df[306,"primauthfirstname"] )
[1] "Gwenalle M"
But that you can't evaluate the characters individually--it's hex code \x## or nothing:
> gsub("\\x","", turk.df[306,"primauthfirstname"] )
[1] "Gwena\xeblle M"