0

I have a data frame similar to this:

> var1<-c("01","01","01","02","02","02","03","03","03","04","04","04")
> var2<-c("0","4","6","8","3","2","5","5","7","7","8","9")
> var3<-c("07","41","60","81","38","22","51","53","71","72","84","97")
> var4<-c("107","241","360","181","238","222","351","453","171","372","684","197")
> df<-data.frame(var1,var2,var3,var4)
> df
   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    8   81  181
5    02    3   38  238
6    02    2   22  222
7    03    5   51  351
8    03    5   53  453
9    03    7   71  171
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

I want to replace all values of the variables var2,var3,var4 with "0" that exist where var1 is 02 and/or 03. The digit number also needs to be the same so that df looks like this:

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

Now, I also need to be sure the command will be executed, even if var1 would not contain 02 or 03. Basically something like if var1 contains 01 or 02 set the corresponding values in var2,var3 and var4 to 0 according to the number of digits in var2,var3 and var4 (e.g. 97 will be 00 and 197 will be 000) and if not, do nothing.

Any suggestions?

Sotos
  • 51,121
  • 6
  • 32
  • 66
Lutz
  • 223
  • 5
  • 15

3 Answers3

1

One solution is to use mutate and case_when from dplyr

library(dplyr)

df <- df %>%
  mutate(var2 = case_when(var1 %in% c('02','03') ~ '0',
                          TRUE ~ as.character(var2)),
         var3 = case_when(var1 %in% c('02','03') ~ '00',
                          TRUE ~ as.character(var3)),
         var4 = case_when(var1 %in% c('02','03') ~ '000',
                          TRUE ~ as.character(var4)))
camnesia
  • 2,143
  • 20
  • 26
1

If you want it to automatically make as many zeros as there are digits in the variable you can use something like this

# define a function
val_to_zero <- function(con, val){ifelse(con, paste0(rep(0,unique(nchar(as.character(val)))), collapse=""),val)}
# define the condition
con <- df$var1 %in% c("01", "02")
# choose which columns to change
vars <- names(df)[2:4]
# apply the function to columns    
df[ , vars] <- do.call("cbind.data.frame", lapply(df[, vars],function(var_i){val_to_zero(con, var_i)}))
# done
df

For this function you do not need to tell by hand how many zeros to use for what column. So if var5 is c("292992", ...) it still works.

  • Even with just two conditions, you may want to consider `df$var1 %in% c("01", "02")` rather than `df$var1 == "01" | df$var1 == "y"`. It's less typing and extends more easily if there are more cases. – Gregor Thomas Mar 13 '20 at 14:56
  • I find the solution with dplyr from @camnesia a bit more straightforward. But also thanks for this suggestion. Maybe other users prefer this option. – Lutz Mar 13 '20 at 15:54
1

Here is an idea where we can do this dynamically for any number of columns, for any number of digits. The trick is to make sure you have character variables (instead of factors) and use sprintf based on the maximum nchar of each column, i.e.

#Convert to character (IF they are factors)
df[] <- lapply(df, as.character)
#Convert values to 0 as per your condition
df[df$var1 %in% c('02', '03'), -1] <- 0
#Add leading 0s to bring to same format as original
df[-1] <- mapply(function(x, y){i1 <- sprintf(paste0('%0', x, 's'), y); gsub(' ', '0', i1)}, 
                               sapply(df[-1], function(i)max(nchar(i))), df[-1])

which gives,

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Interesting suggestion. I can´t apply this to my original data (there are more columns that shouldn´t be converted to 0 but of course, you could not know from my question), but I think this can be very useful in future. Thanks! – Lutz Mar 13 '20 at 15:49
  • 1
    You can exclude unwanted columns. For example, in this case I exclude the first column (that's why I have the `-1`, i.e. `df[-1]`). If you replace the `-1` above with the index of columns you want to exclude, it should work just fine – Sotos Mar 13 '20 at 15:54