Multiple conditions while grouping data with NAs and empty cells

Question

I have a data frame as follows:

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
text=c("a","","R","NA","","iy","","NA","ot","ir","","NA","","","","","NA")

df <- data.frame(ID,text)

df %>% arrange(ID)

   ID text
1   1    a
2   1     
3   1   iy
4   1   ir
5   1   NA
6   1     
7   2    R
8   2     
9   2   ot
10  2     
11  2     
12  3   NA
13  3     
14  3     
15  3   NA
16  4   NA
17  4

For each ID I have a character/ text collected. I can have NA values and/or empty values corresponding to IDs. I would like to create a binary column to present if there is text available to any of the text rows collected for an ID. I am running this code:

df %>% group_by(ID) %>% 
    summarise(text_availabe=if(any(!is.na(text))) 1 else 0)

which populates the following where for ID 3 and ID 4, it treats empty cells as they have text.

     ID text_availabe
  <dbl>         <dbl>
1     1             1
2     2             1
3     3             1
4     4             1

My idea output in this case should be like:

     ID text_availabe
  <dbl>         <dbl>
1     1             1
2     2             1
3     3             0
4     4             0

Thank you very much for your help in advance!

@Ronak Shah, wondering if you can help. Thanks! – Alex May 20 '21 at 17:44 — Alex, May 20 '21 at 17:44

LMc · Accepted Answer · 2021-05-20T19:36:03.523

1

I think the problem you are having is that "NA" is not the same as NA:

library(dplyr)
df %>% 
  group_by(ID) %>% 
  summarize(text_available = any(!text %in% c("", "NA")), .groups = "drop")

!is.na("NA")
[1] TRUE

This is a character string with the letters "N" and "A", so it returns TRUE.

Output

Logical columns are representations of 1 and 0:

TRUE == 1
[1] TRUE

But if you need it to be in 1/0 form, then just wrap it with as.integer: as.integer(any(...)).

     ID text_available
  <dbl> <lgl>         
1     1 TRUE          
2     2 TRUE          
3     3 FALSE         
4     4 FALSE

edited May 20 '21 at 19:36

answered May 20 '21 at 19:30

LMc

12,577
3
31
43

would it be OK if I use `if` instead of `as.integer` like `df %>% group_by(ID) %>% summarise(text_availabe=if(any(!text %in% c("", "NA"))) 1 else 0)` – Alex May 20 '21 at 22:58
1

Use the vectorized form of `if`, which is `ifelse`. You can do: `ifelse(any(!text %in% c("", "NA")), 1, 0)`. `If` and `else` is reserved for conditions of length 1. – LMc May 20 '21 at 23:00
Thanks! well, can you elaborate on vectorized form of `if`. The above code gives me the same output as using `if` and `else` that you are proposing. – Alex May 20 '21 at 23:07
1

Actually, in this context `if` and `else` works. Because you are creating one logical value using `any` they will return the same thing. Generally, in pipe chains and in `mutate` you are evaluating a condition across many rows resulting in a logical *vector* rather than a single logical value, which is when you would use `if` and `else`. – LMc May 20 '21 at 23:09
1

Consider reading this [SO question](https://stackoverflow.com/questions/17252905/else-if-vs-ifelse) for a more in-depth answer. – LMc May 20 '21 at 23:10
thanks! I read through it and I think I noticed what you meant. Would you mind rating the question if you think it has value. Appreciated! – Alex May 21 '21 at 03:37
thanks! the piece of code worked well on the real data which had ~ 780 rows – Alex May 21 '21 at 16:29
my apologies, I thought in my mind that I did so yesterday. – Alex May 21 '21 at 17:21

Multiple conditions while grouping data with NAs and empty cells

1 Answers1