I'm wondering if there's a dplyr
equivalent to
df <- data.frame(A=1:5,B=2:6,C=-1:3)
df[df==2] <- 10
I'm looking for
df %>% <??>
That is, a statement that is chainable with other dplyr
verbs
I'm wondering if there's a dplyr
equivalent to
df <- data.frame(A=1:5,B=2:6,C=-1:3)
df[df==2] <- 10
I'm looking for
df %>% <??>
That is, a statement that is chainable with other dplyr
verbs
1) replace Try this. It only requires magrittr although dplyr imports the relevant part of magrittr so it will work with dplyr too:
df %>% replace(. == 2, 10)
giving:
A B C
1 1 10 -1
2 10 3 0
3 3 4 1
4 4 5 10
5 5 6 3
1a) overwriting Note that the above is non-destructive so if you want to update df
then you will need to assign it back:
df <- df %>% replace(. == 2, 10)
or
df %>% replace(. == 2, 10) -> df
or use the magrittr %<>%
operator which eliminates referencing df
twice:
df %<>% replace(. == 2, 10)
2) arithmetic This would also work:
df %>% { 10 * (. == 2) + . * (. != 2) }
The OP's question is about how to replace values using dplyr
, and it has been resolved thanks to G. Grothendieck. But I am curious that how the performances differ between different approaches based on dplyr
, data.table
and base R. So I designed and conducted the following benchmarking.
# Load package
library(dplyr)
library(data.table)
library(microbenchmark)
# Create example data frame
df <- data.frame(A = 1:5, B = 2:6, C = -1:3)
# Convert to data.table
dt <- as.data.table(df)
# Method 1: Use mutate_all and ifelse
F1 = function(df){df %>% mutate_all(funs(ifelse(. == 2, 10, .)))}
# Method 2: Use mutate_all and replace
F2 = function(df){df %>% mutate_all(funs(replace(., . == 2, 10)))}
# Method 3: Use replace
F3 = function(df){df %>% replace(. == 2, 10)}
# Method 4: Base R data frame assignment
F4 = function(df){
df[df == 2] <- 10
return(df)
}
# Benchmarking
microbenchmark(
M1 = F1(df),
M2 = F2(df),
M3 = F3(df),
M4 = F4(df),
# Same as M4, but use data.table object as input
M5 = F4(dt)
)
Unit: microseconds
expr min lq mean median uq max neval
M1 8634.974 13028.7975 17224.4669 14907.3735 19496.5275 79750.182 100
M2 8925.565 12626.2675 16698.7412 15551.7410 18658.1125 35468.760 100
M3 282.252 391.6240 591.2534 553.5980 647.8965 3290.797 100
M4 163.578 252.1025 423.7627 349.6080 420.8125 5415.382 100
M5 228.367 333.2495 596.1735 440.3775 555.5230 7506.609 100
The results show that mutata_all
with ifelse
(M1
) or replace
(M2
) are much slower than other approaches. Use replace
with pipe (M3
) is fast, but still a little bit slower than base R (M4
). Convert data.frame
to data.table
and then apply the assignment replacement (M5
) is not faster than M4
.
So, I think in this case, there are no special needs to use dplyr
functions because it is not faster than base R method (M4
). There are also no needs to convert data.frame
to data.table
If pipe operation is desirable. We can use pipe with replace
(M3
). Or, we can define a function, such as F4
, and put it in the pipe operation.