1

I'm trying to replace some unexpected characters in a data frame in R. According to Replace multiple arguments with gsub, gsub function is supposed to work properly in this cases, so I tried that way.

The values I have in the first column of the data frame are the following:

La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

And the code's been implemented as follows:

callChangeCharacters <- function(results){
for(i in 1:nrow(results)){
    race <- results[i,1]
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    results[i,1] <- race
}
return(results)
}

If I run the code which is inside the for loop, I success to get the expected result:

La Fleche Wallonne
Liege - Bastogne - Liege
Tour de Romandie
Giro d'Italia
Criterium du Dauphine

However, if I call the function, the result isn't the same, and the unwanted characters aren't corrected:

> correctedDF <- callChangeCharacters(results)
> correctedDF
                                        V1
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

The output of the result I get is the following (this version of results is longer but the problem is the same):

> dput(results)
structure(list(V1 = c("Santos Tour Down Under", "Paris - Nice", 
"Tirreno-Adriatico", "Milano-Sanremo", "Volta Ciclista a Catalunya", 
"E3 Prijs Vlaanderen - Harelbeke", "Gent - Wevelgem", "Ronde van Vlaanderen / Tour des Flandres", 
"Vuelta Ciclista al Pais Vasco", "Paris - Roubaix", "Amstel Gold Race", 
"La Flèche Wallonne", "Liège - Bastogne - Liège", "Tour de Romandie", 
"Giro d´Italia", "Critérium du Dauphiné", "Tour de Suisse", 
"Tour de France", "Tour de Pologne", NA, "Clasica Ciclista San Sebastian", 
"Eneco Tour", "Vuelta a España", "Vattenfall Cyclassics", "GP Ouest France - Plouay", 
"Grand Prix Cycliste de Québec", "Grand Prix Cycliste de Montréal", 
"Il Lombardia", "Tour of Beijing")), .Names = "V1", row.names = c(1L, 
1686L, 4601L, 6743L, 6943L, 9274L, 9473L, 9673L, 9880L, 11581L, 
11779L, 11978L, 12168L, 12367L, 14264L, 21957L, 24734L, 27727L, 
35542L, 37354L, 37470L, 37627L, 39885L, 47277L, 47441L, 47624L, 
47788L, 47952L, 48147L), class = "data.frame")

Any idea of why it doesn't work inside the function?

Thanks in advance.

Community
  • 1
  • 1
Hibai
  • 11
  • 3
  • 1
    Not sure of the answer, sorry, but I got your code to work just fine, assuming your original data frame is named "results" and it's a character column. Have you tried restarting R? I've had similar issues where `grep` functions won't work. – Branden Murray Sep 26 '15 at 21:37
  • 1
    your function works fine for me. can you write the output of `dput(results)` in your question. – Dhawal Kapil Sep 26 '15 at 22:46
  • Thanks @Branden, but got the same result: it worked running the for loop, but didn't correct the wrong characters while executing the function. – Hibai Sep 26 '15 at 22:50
  • @DhawalKapil, actually the output I'm getting is written in the question, at least if I understand what you are asking for. – Hibai Sep 26 '15 at 22:53
  • i want you to write the `results` object that you are passing to the function. you can write it using `dput(results)` – Dhawal Kapil Sep 26 '15 at 22:54
  • Ok @DhawalKapil, I've just added the output in the final part of the question. – Hibai Sep 26 '15 at 23:12
  • That indicates your input is a vector, not a data frame. I don't know how you got the for loop to work because `nrow(results)` when `results` is a vector returns `NULL` and `results[1,1]` returns an error. Try changing `results` to a data frame or, alternatively, using `NROW` (or `length`) instead of `nrow` and replacing `results[i,1]` with `results[i]` in the function. – Branden Murray Sep 26 '15 at 23:20
  • That's true @Branden, I mixed elements when restarted R. Now I have tried with a data frame (I'm going to edit again that part of the question), and the result is the same. I'll try your alternatives anyway. – Hibai Sep 26 '15 at 23:27
  • I recreated the data frame using your `dput` output and the function worked as desired. Maybe try it on another machine or on an online interpreter (http://www.tutorialspoint.com/r_terminal_online.php). If your code works on either of those then that indicates it's probably something specific to your machine, and not something wrong with your code. – Branden Murray Sep 26 '15 at 23:55
  • @BrandenMurray I tried reinstalling RGui, but it didn't either work. Tomorrow I'll try in another machine, and see if it works there. – Hibai Sep 27 '15 at 20:50

2 Answers2

3

I had a similar issue, which occurred because I was using the source function to import my code without specifying that the encoding parameter should be "utf-8".

source("./code.R")

Upon inspecting a function I had read in, I realised that certain special characters had been changed by the source function and hence the function was not working as intended. The solution was to set the encoding parameter to "utf-8".

source("./code.R", encoding="utf-8")
Jonathon
  • 31
  • 3
0

Your code works. Also, you should also change ñ (see "Vuelta a España").

The gsub function is vectorized so you don't need the loop at all.

cleanup <- function(race) {
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    return(race)
}

results$V1 <- cleanup(results$V1)

Why do you use a data.frame if you only have one column? It would be more convenient to just keep a vector race.

If you really want a function which works on results directly, still no loop.

callChangeCharacters <- function(results) {
    results[,1] <- cleanup(results[,1])
    return(results)
}
asachet
  • 6,620
  • 2
  • 30
  • 74
  • Thanks for your fixes. This way it looks much faster. I was working with data frames because I have more columns with characters to correct, but I can get by with vectors also. Anyway, it'n not working yet. I'll try in another machine as stated in the previous response. – Hibai Sep 27 '15 at 20:52