2

Loading data from a package on a windows machine, encoding is botched.

require(vegdata)
tax_dbf <- load.taxlist("GermanSL 1.3", detailed=TRUE)
tax_dbf[33,"BEGRUEND"]

[1] " "Einfügen einer Zwischenebene""

I can fix that:

Encoding(tax_dbf$BEGRUEND) <- "UTF-8"
tax_dbf[33,"BEGRUEND"]

[1] "Einfügen einer Zwischenebene"

However, I didn't manage to find an easy way to declare encoding for all character columns in the df, and my SO search foo is weak today as well. This is confuddling.

Anyone from the tidyverse having a oneliner for my prose?

aae
  • 439
  • 7
  • 18

1 Answers1

5

No need to use tidyverse. Just loop over columns that satisfy the condition:

set.seed(1)

df <- data.frame(a = rep("Einfügen einer Zwischenebene", 5), b = runif(5), c = rep("Einfügen einer Zwischenebene", 5), stringsAsFactors = F)

cols <- names(df)

for(i in seq_along(cols)){

  if(!is.character(df[, cols[[i]]])) next

  Encoding(df[, cols[[i]]]) <- "UTF-8"

}

Resulting in:

> df
                             a         b                            c
1 Einfügen einer Zwischenebene 0.2655087 Einfügen einer Zwischenebene
2 Einfügen einer Zwischenebene 0.3721239 Einfügen einer Zwischenebene
3 Einfügen einer Zwischenebene 0.5728534 Einfügen einer Zwischenebene
4 Einfügen einer Zwischenebene 0.9082078 Einfügen einer Zwischenebene
5 Einfügen einer Zwischenebene 0.2016819 Einfügen einer Zwischenebene

dplyr solution

dplyr::mutate_if(df, is.character, .funs = function(x){return(`Encoding<-`(x, "UTF-8"))})
JdeMello
  • 1,708
  • 15
  • 23
  • Great. I failed on all ccounts, the '!is.character' and 'next' in the loop and the 'mutate_if' condition. The "return(`Encoding<-`(x, "UTF-8"))" for the function is and interesting way of thinking, what do these backticks do? Never done that. – aae Jan 16 '19 at 17:03
  • 1
    You didn't fail at all, You had the right mindset, everybody is learning here For backticks discussion see [here](https://stackoverflow.com/questions/36220823/what-do-backticks-do-in-r) – JdeMello Jan 16 '19 at 17:23