0

I have a data frame as follows:

      [,1]  [,2] [,3]  [,4]  [,5]  [,6]  [,7]

[1,]    A    4    NA    NA   1.55   4     NA

[2,]    B    NA   NA    4    0.56   NA    NA

[3,]    C    4    4     NA   0.62   4     4

[4,]    D    NA   NA    NA   1.61   4     NA

[5,]    E    4    NA    NA    0.5   4     NA

What I would like to get as the output after filtering is:

       [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7]


   [3,]  C     4     4    NA   0.62    4    4


   [5,]  E     4     NA   NA    0.5    4    NA

I would like to have at least one value equals to 4 in columns 2 to 4 & at least one value equals to 4 in columns 6 to 7.

I was thinking to use the following command But I am not sure how to use it in a proper way that gives me the correct output.

here is the command:

 new.df <- df %>% 
 dplyr::filter_at((vars(c(2:4)), any_vars(.  == 4) & vars(c(6:7)), any_vars(. == 4))

Do you have any idea how can I get the desired new.df? Thanks!

say.ff
  • 373
  • 1
  • 7
  • 21
  • 1
    Because of the way you formatted your example data frame, it's impossible to copy it into R to work with. This makes it really hard for us to try to solve your problem. Please use `dput` to make a version of it we can put directly into R. Take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) to see more – divibisan Aug 08 '18 at 19:22

2 Answers2

1

In base R you could do something like:

 df[rowSums(df[2:4]==4,T)>0 & rowSums(df[6:7]==4,T)>0,]
  col1 col2 col3 col4 col5 col6 col7
1    A    4   NA   NA 1.55    4   NA
3    C    4    4   NA 0.62    4    4
5    E    4   NA   NA 0.50    4   NA
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

I am not certain what is wrong with unless it is too verbose for you and you want a way to not name the columns.

df = data.frame(col1 = c("A", "B", "C", "D", "E"), 
                col2 = c(4, NA, 4, NA, 4), 
                col3 = c(NA, NA, 4, NA, NA), 
                col4 = c(NA, 4, NA, NA, NA), 
                col5 = c(1.55, 0.56, 0.62, 1.61, 0.5 ), 
                col6 = c(4, NA, 4, 4, 4), 
                col7 = c(NA, NA, 4, NA, NA))

df %>% filter((col2 == 4| col3 == 4 | col4 == 4) & (col6 == 4 | col7 == 4))

Which produces:

    col1 col2 col3 col4 col5 col6 col7
1    A    4   NA   NA 1.55    4   NA
2    C    4    4   NA 0.62    4    4
3    E    4   NA   NA 0.50    4   NA
Adam Warner
  • 1,334
  • 2
  • 14
  • 30