0

How can I filter in R only those rows in data.frame in which the value for column V6 appears exactly 2 times.

I try:

library(dplyr)

df <- as.data.frame(date)
df1 <- subset(df,duplicated(V6))
user438383
  • 5,716
  • 8
  • 28
  • 43
Daisy
  • 31
  • 5
  • 3
    Welcome on SO! Please see [how to make a greate reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), especially using `dput` instead of a screenshot. – MonJeanJean Jun 24 '22 at 07:25
  • https://stackoverflow.com/questions/20204257/subset-data-frame-based-on-number-of-rows-per-group - this seems to be a similar question - good luck! – BenL Jun 24 '22 at 07:28

2 Answers2

1

You can use the following code:

df[with(df, ave(V6, V6, FUN = length)) == 2,]

Output:

   V1 V6
5   4  5
7   6  9
8   7  9
12 11  5

Data used:

df <- data.frame(V1 = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
                 V6 = c("V5", "3", "2", "3", "5", "8", "9", "9", "4", "3", "3", "5", "6", "6", "6", "7"))
Quinten
  • 35,235
  • 5
  • 20
  • 53
  • 1
    Thank you for your answer it's a good lead but I only get NA values if I do this it in RStudio. – Daisy Jun 24 '22 at 07:55
  • Hi @Daisy, Could you please share your data using `dput` in your question above? So we can reproduce your problem. – Quinten Jun 24 '22 at 08:29
  • Once more thanks for your answer. Sorry dput don't work for me in this case I am new nember in stack overflow. If you would be so kind and maybe if you know you could give some other answer for my question. The one that will work for me. – Daisy Jun 24 '22 at 09:03
  • @Daisy, that is no problem. What you should try is use `dput(df)` in your console, copy and paste the output of that in your question above. If you do that, we can reproduce your exact problem and are able to help you more easily. – Quinten Jun 24 '22 at 09:06
  • 1
    Thank you . After I reset RStudio your solution worked – Daisy Jun 24 '22 at 09:38
0

Something like this? Values 3 and 4 only appear once in V6, value 5 three times. So only rows having value of 1 in V6 were kept:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data <- tibble(
  V1 = c(1,2,3,4,5,6,7),
  V6 = c(1,1,3,4,5,5,5)
)
data
#> # A tibble: 7 × 2
#>      V1    V6
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     1
#> 3     3     3
#> 4     4     4
#> 5     5     5
#> 6     6     5
#> 7     7     5

data %>%
  group_by(V6) %>%
  filter(n() == 2)
#> # A tibble: 2 × 2
#> # Groups:   V6 [1]
#>      V1    V6
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     1

Created on 2022-06-24 by the reprex package (v2.0.0)

danlooo
  • 10,067
  • 2
  • 8
  • 22