Update a variable if dplyr filter conditions are met

Question

With the command df %>% filter(is.na(df)[,2:4]) filter function subset in a new df that has rows with NA's in columns 2, 3 and 4. What I want is not a new subsetted df but rather assign in example "1" to a new variable called "Exclude" in the actual df.

This example with mutate was not exactly what I was looking for, but close: Use dplyr´s filter and mutate to generate a new variable

Also I would need the same to happen with other filter conditions.

Example I have the following:

df <- data.frame(A = 1:6, B = 11:16, C = 21:26, D = 31:36)
df[3,2:4] <- NA
df[5,2:4] <- NA


df
> df
  A  B  C  D
1 1 11 21 31
2 2 12 22 32
3 3 NA NA NA
4 4 14 24 34
5 5 NA NA NA
6 6 16 26 36

and would like

> df
  A  B  C  D Exclude
1 1 11 21 31      NA
2 2 12 22 32      NA
3 3 NA NA NA       1
4 4 14 24 34      NA
5 5 NA NA NA       1
6 6 16 26 36      NA

Any good ideas how the filter subset could be used to update easy? The hard way work around would be to generate this subset, create new variable for all and then join back but that is not tidy code.

score 0 · Answer 1 · answered Dec 10 '20 at 15:43

0

Does this work:

library(dplyr)
df %>% rowwise() %>% 
     mutate(Exclude = +any(is.na(c_across(everything()))), Exclude = na_if(Exclude, 0))

# A tibble: 6 x 5
# Rowwise: 
      A     B     C     D Exclude
  <int> <int> <int> <int>   <int>
1     1    11    21    31      NA
2     2    12    22    32      NA
3     3    NA    NA    NA       1
4     4    14    24    34      NA
5     5    NA    NA    NA       1
6     6    16    26    36      NA

answered Dec 10 '20 at 15:43

Karthik S

11,348
2
11
25

Thanx - but as I replied to firs answer it works on sample but not on real data where I get the following error: Error: Problem with `mutate()` input `Exclude`. x Can't combine `Startdato` and `3: Medinsight` . i Input `Exclude` is `+any(is.na(c_across(everything())))`. i The error occurred in row 1. – EirikS Dec 10 '20 at 16:13
while this run nicely df %>% filter(is.na(df)[,11:30]) – EirikS Dec 10 '20 at 16:13

jay.sf · Answer 2 · 2020-12-10T15:52:35.353

0

Using anyNA.

df %>% mutate(Exclude=ifelse(apply(df[2:4], 1, anyNA), 1, NA))  
#   A  B  C  D Exclude
# 1 1 11 21 31      NA
# 2 2 12 22 32      NA
# 3 3 NA NA NA       1
# 4 4 14 24 34      NA
# 5 5 NA NA NA       1
# 6 6 16 26 36      NA

Or just

df$Exclude <- ifelse(apply(df[2:4], 1, anyNA), 1, NA)

edited Dec 10 '20 at 15:52

answered Dec 10 '20 at 15:46

jay.sf

60,139
8
53
110

Using the sample data this works fine also with me. But when applying to the working dataset with 350 records and 63 variables it is not working: Error: Problem with `mutate()` input `Exclude`. x Input `Exclude` can't be recycled to size 350. i Input `Exclude` is `ifelse(apply(df[11:30], 1, anyNA), 1, NA)`. i Input `Exclude` must be size 350 or 1, not 20. – EirikS Dec 10 '20 at 16:10
or with other option I get this Error in set(x, j = name, value = value) : Supplied 20 items to be assigned to 350 items of column 'Exclude'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. – EirikS Dec 10 '20 at 16:11
@EirikS read https://stackoverflow.com/questions/57290514/error-when-trying-to-store-list-in-data-table-of-length-1 – jay.sf Dec 10 '20 at 16:57

score 0 · Accepted Answer · answered Dec 10 '20 at 15:57

0

We can do this with base R using vectorized rowSums

df$Exclude <- NA^!rowSums(is.na(df[-1]))

-output

df
#  A  B  C  D Exclude
#1 1 11 21 31      NA
#2 2 12 22 32      NA
#3 3 NA NA NA       1
#4 4 14 24 34      NA
#5 5 NA NA NA       1
#6 6 16 26 36      NA

answered Dec 10 '20 at 15:57

akrun

874,273
37
540
662

Thanx - and as I wrote in reply to the other answers this works fine on sample data but not on actual data where error message from this command gives: Error in set(x, j = name, value = value) : Supplied 349 items to be assigned to 350 items of column 'Exclude'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. – EirikS Dec 10 '20 at 16:14
@EirikS Do you have a `data.table` instead of `data.frame` ? – akrun Dec 10 '20 at 16:17
1

Thanx - yeah that solved the problem, it was both data.table and data.frame and using as.data.frame corrected it – EirikS Dec 10 '20 at 18:59

score 0 · Answer 4 · answered Dec 10 '20 at 16:01

0

Another one-line solution:

df$Exclude <- as.numeric(apply(df[2:4], 1, function(x) any(is.na(x))))

answered Dec 10 '20 at 16:01

SteveM

2,226
3
12
16

score 0 · Answer 5 · answered Dec 10 '20 at 16:01

Use rowwise, sum over all numeric columns, assign 1 or NA in ifelse.

df <- data.frame(A = 1:6, B = 11:16, C = 21:26, D = 31:36)
df[3, 2:4] <- NA
df[5, 2:4] <- NA

library(tidyverse)

df %>%
  rowwise() %>% 
  mutate(Exclude = ifelse(
    is.na(sum(c_across(where(is.numeric)))), 1, NA
  ))
#> # A tibble: 6 x 5
#> # Rowwise: 
#>       A     B     C     D Exclude
#>   <int> <int> <int> <int>   <dbl>
#> 1     1    11    21    31      NA
#> 2     2    12    22    32      NA
#> 3     3    NA    NA    NA       1
#> 4     4    14    24    34      NA
#> 5     5    NA    NA    NA       1
#> 6     6    16    26    36      NA

Update a variable if dplyr filter conditions are met

5 Answers5