0

I was trying to gsub some dirty data like these "cds 9"; "cTT-9";"lee" in the data frame by building a function; and then lapply it to the dataframe

I have tested for several input like "cds 9"; "cTT-9";"lee" and they all gave me the expected results as "CDS9";"CTT9";"LEE"

`data_clean <- function(arg) { 
   outcome<-arg
   output1<-gsub(pattern=" ",replacement="",arg)
   if (output1!=arg){outcome<-output1}
   output2<-gsub(pattern="-",replacement="",arg)
   if (output2!=arg){outcome<-output2}
   toupper(outcome)
 }

 df<-lapply(df, data_clean)`    

However,when i lapply the function to my data frame, it printed out this:

"Error in if (output1 != arg) { : argument is of length zero"

P.S. the data frame looks like this: enter image description here

sboysel
  • 631
  • 7
  • 17

1 Answers1

0

You can apply gsub and toupper to every column with lapply, then recombine the results using do.call and cbind:

df <- data.frame(
  A = c("A b", "a-B", "Ab"),
  B = c("c D", "c-D", "cD")
)

do.call(cbind, lapply(df, function(x) toupper(gsub("\\s|-", "", x))))
#>      A    B   
#> [1,] "AB" "CD"
#> [2,] "AB" "CD"
#> [3,] "AB" "CD"

Created on 2019-10-14 by the reprex package (v0.3.0)

gsub uses the regular expression \\s|- to replace either a space (\\s) or a dash (-) with "".

sboysel
  • 631
  • 7
  • 17
  • Thank you very much for answering this question! It works on my data now, but still cannot figure out why my function does not work when lapply to my dataframe – Zhongyuan Zhang Oct 15 '19 at 15:59
  • What do you mean 'not work when lapply to my dataframe'? In other words, what is the object returned by your code and what do you expect? Not that `lapply` will always return a `list`, and not a `data.frame`. – sboysel Oct 15 '19 at 20:20