1

I wrote a function looking as follows:

special_char <- function(data_in) {
  data_in=gsub("à","a",data_in)
  data_in=gsub("â","a",data_in)
  data_in=gsub("é","e",data_in)
  data_in=gsub("î","i",data_in)
  data_in=gsub("ä","ae",data_in)
  data_in=gsub("ö","oe",data_in)
  data_in=gsub("ü","ue",data_in)
  data_in=gsub("imp.","impessa",data_in)
  data_in=gsub("ch.","chemin",data_in)
  data_in=gsub("av.","avenue",data_in)
  data_in=gsub("str.","strasse",data_in)

  return(data_in)
}

Then, I try to apply it on my dataset using.

some_data %>% mutate_all(funs(special_char(.)))

However, the output is a mess. Does someone notice an obvious mistake in my approach?

Suppose I have the following input:

data_test <- data.frame(col1 = c("Céline", "Désiré", "Björn"))

I would expect to get the following output:

c("Celine", "Desire", "Bjoern")
Patrick Balada
  • 1,330
  • 1
  • 18
  • 37

1 Answers1

1

This works for me:

some_data  %>% mutate_all(funs(special_char))

I hope this also solves the issue for you. If not, what does your data look like?

Florian

Florian
  • 24,425
  • 4
  • 49
  • 80
  • Unfortunately not. My output look as follows:ncated> 1 ueoueeueaueoueeueeueoueeueiueoueeueaueoueeueeueoueeueeueoueeueaueoueeueeueoueeueiueoueeueaueoueeueeueoueeueaueoueeue... – Patrick Balada Jul 14 '17 at 12:32
  • That is strange, I ran my code with your sample input data_test and I get the input as expected. Is the error you provide based on running with your sample input as above, or with your original input? – Florian Jul 14 '17 at 12:34
  • With the sample input. That's realy strange... Maybe there is something wrong with my encoding. – Patrick Balada Jul 14 '17 at 12:35
  • 1
    That might be the case, what does Encoding(as.character(data_test$col1[1])) return for you? For me it says UTF-8. – Florian Jul 14 '17 at 12:38
  • "latin1". How can I change it to utf8? – Patrick Balada Jul 14 '17 at 12:40
  • Try: data_test<- enc2utf8(data_test) – Florian Jul 14 '17 at 12:46
  • @ F.Maas Thanks for the advice. Unfortunately, the encoding remained unknown after using enc2utf8. I managed to change the encoding using iconv(data_test, "latin1", "utf8") – Patrick Balada Jul 15 '17 at 11:56
  • Great to know that you were able to solve the issue! Was also a new kind of issue for me so I learned from it as well ;) – Florian Jul 15 '17 at 11:58
  • @ Florian. I found the actual problem. It's caused by applying the external function, which is read using the source-command. It is possible to define the encoding when using source. Specifying encoding = "utf8" solved my issue. Maybe it helps someone in the future. – Patrick Balada Jul 24 '17 at 13:19