1
column19 <- 19
mdf[,column19] <- lapply(mdf[,column19],function(x){as.numeric(gsub(",", "", x))})

this snippet works but also results in duplicate values

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
freygig
  • 11
  • 1

1 Answers1

2

If there is only a single column, we don't need lapply

mdf[, column19] <- as.numeric(gsub(",", "", mdf[, column19], fixed = TRUE))

The reason why the OP's code didn't work out as expected is because lapply on a single column after converting it to a vector (mdf[, column19]) and loop through each of the single element of the column and return a list. Now, we are assigning the output of list back to that single column

column19 <- 19
mdf[,column19] <- lapply(mdf[,column19],function(x) as.numeric(gsub(",", "", x)))

Warning message: In [<-.data.frame(*tmp*, , column19, value = list(27, 49, 510, : provided 5 variables to replace 1 variables

Instead, if we want to use the same procedure either keep the data.frame structure by mdf[column19] or mdf[, column19, drop = FALSE] and then loop using lapply. In this way, it will be a list with a single vector

mdf[column19] <- lapply(mdf[column19],function(x) as.numeric(gsub(",", "", x)))

This is more related to dropping of dimensions when using [ on a single column or row as by default it is drop = TRUE.

data

set.seed(24)
mdf <- as.data.frame(matrix(sample(paste(1:5, 6:10, sep=","), 
   5*20, replace = TRUE), 5, 20), stringsAsFactors=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    could add `fixed=TRUE` for this one. – lmo Jul 06 '17 at 12:12
  • @Sotos I have no problems with duping correctly, but here the OP was using correct `gsub` on an incorrect loop. So, I am not sure if the dupe is correct for that – akrun Jul 06 '17 at 12:30
  • Ok. I believe It is a problem that can be solved by reviewing the dupe so IMO it is correct. – Sotos Jul 06 '17 at 13:11