83

I am trying to understand how to conditional replace values in a dataframe without using a loop. My data frame is structured as follows:

df <- data.frame(
  a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7, 11.43, 11.41, 10.48512, 11.19),
  b = c(2, 3, 2, 0, 0, 0, 1, 2, 4, 0),
  est = numeric(10)
)
df
          a b est
1  11.77000 2   0
2  10.90000 3   0
3  10.32000 2   0
4  10.96000 0   0
5   9.90600 0   0
6  10.70000 0   0
7  11.43000 1   0
8  11.41000 2   0
9  10.48512 4   0
10 11.19000 0   0

What I want to do, is to check the value of b. If b is 0, I want to set est to a value from a. I understand that df$est[df$b == 0] <- 23 will set all values of est to 23, when b==0. What I don't understand is how to set est to a value of a when that condition is true. For example:

df$est[df$b == 0] <- (df$a - 5)/2.533 
                                

gives the following warning:

Warning message:
In df$est[df$b == 0] <- (df$a - 5)/2.533 :
  number of items to replace is not a multiple of replacement length

Is there a way that I can pass the relevant cell, rather than vector?

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
djq
  • 14,810
  • 45
  • 122
  • 157

5 Answers5

98

Since you are conditionally indexing df$est, you also need to conditionally index the replacement vector df$a:

index <- df$b == 0
df$est[index] <- (df$a[index] - 5)/2.533 

Of course, the variable index is just temporary, and I use it to make the code a bit more readible. You can write it in one step:

df$est[df$b == 0] <- (df$a[df$b == 0] - 5)/2.533 

For even better readibility, you can use within:

df <- within(df, est[b==0] <- (a[b==0]-5)/2.533)

The results, regardless of which method you choose:

df
          a b      est
1  11.77000 2 0.000000
2  10.90000 3 0.000000
3  10.32000 2 0.000000
4  10.96000 0 2.352941
5   9.90600 0 1.936834
6  10.70000 0 2.250296
7  11.43000 1 0.000000
8  11.41000 2 0.000000
9  10.48512 4 0.000000
10 11.19000 0 2.443743

As others have pointed out, an alternative solution in your example is to use ifelse.

Andrie
  • 176,377
  • 47
  • 447
  • 496
28

Try data.table's := operator :

DT = as.data.table(df)
DT[b==0, est := (a-5)/2.533]

It's fast and short. See these linked questions for more information on := :

Why has data.table defined :=

When should I use the := operator in data.table

How do you remove columns from a data.frame

R self reference

Community
  • 1
  • 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
22

Here is one approach. ifelse is vectorized and it checks all rows for zero values of b and replaces est with (a - 5)/2.53 if that is the case.

df <- transform(df, est = ifelse(b == 0, (a - 5)/2.53, est))
Ramnath
  • 54,439
  • 16
  • 125
  • 152
11

Another option would be to use case_when

require(dplyr)

mutate(df, est = case_when(
    b == 0 ~ (a - 5)/2.53, 
    TRUE   ~ est 
))

This solution becomes even more handy if more than 2 cases need to be distinguished, as it allows to avoid nested if_else constructs.

Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
7

The R-inferno, or the basic R-documentation will explain why using df$* is not the best approach here. From the help page for "[" :

"Indexing by [ is similar to atomic vectors and selects a list of the specified element(s). Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument. "

I recommend using the [row,col] notation instead. Example:

Rgames: foo   
         x    y z  
   [1,] 1e+00 1 0  
   [2,] 2e+00 2 0  
   [3,] 3e+00 1 0  
   [4,] 4e+00 2 0  
   [5,] 5e+00 1 0  
   [6,] 6e+00 2 0  
   [7,] 7e+00 1 0  
   [8,] 8e+00 2 0  
   [9,] 9e+00 1 0  
   [10,] 1e+01 2 0  
Rgames: foo<-as.data.frame(foo)

Rgames: foo[foo$y==2,3]<-foo[foo$y==2,1]
Rgames: foo
       x y     z
1  1e+00 1 0e+00
2  2e+00 2 2e+00
3  3e+00 1 0e+00
4  4e+00 2 4e+00
5  5e+00 1 0e+00
6  6e+00 2 6e+00
7  7e+00 1 0e+00
8  8e+00 2 8e+00
9  9e+00 1 0e+00
10 1e+01 2 1e+01
Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • This deserves an upvote if you first add either a link to the R-Inferno page, or summarize the issues with `$` (or ideally both). – Andrie Nov 21 '11 at 16:01
  • +1 Although I think the `$` operator is perfectly fine in this case. (Also, I note that despite your warning you use `$` yourself...) – Andrie Nov 21 '11 at 16:19
  • @Andrie: yes, I used it where it works (not that that is a lot of help :-) ). The OP tried to use it to define what elements were being acted on, which is where the trouble started. I just used it to define a condition that selected dataframe elements. But you knew that :-) – Carl Witthoft Nov 21 '11 at 17:57