0

I was reading this post when I came across with a question.

Why (in the post's dataframe) this function doesn't return the same value

>df[df$X3==c(1,2),]

   X1    X2 X3
1  s1 45.11  1
4  s1 51.41  2
10 s1 43.12  2
17 s5 25.40  1

as this function?

>df[df$X3 %in% c(1,2),] 

   X1    X2 X3
1  s1 45.11  1
2  s1 45.13  1
3  s1 53.42  2
4  s1 51.41  2
9  s3 43.58  2
10 s1 43.12  2
17 s5 25.40  1
18 s5 25.50  1

I used to believe that both are kind of equal. What's the difference between them?

Thanks in advance.

Community
  • 1
  • 1
Cris
  • 787
  • 1
  • 5
  • 19

1 Answers1

3

df$X3 == c(1,2) is not doing what you think. c(1,2) is first recycled to have the same length as length(df$X3), then element-wise == is performed. Let's take a small example:

1:4 == 2:3  ## which is doing `1:4 == c(2,3,2,3)`
# [1] FALSE FALSE FALSE FALSE

and we get all FALSE. On the other hand, if we do

1:4 %in% 2:3
# [1] FALSE  TRUE  TRUE FALSE

we get two TRUE.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248