1

I have a data frame as follows:

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
text=c("a","","R","NA","","iy","","NA","ot","ir","","NA","","","","","NA")

df <- data.frame(ID,text)
df %>% arrange(ID)

   ID text
1   1    a
2   1     
3   1   iy
4   1   ir
5   1   NA
6   1     
7   2    R
8   2     
9   2   ot
10  2     
11  2     
12  3   NA
13  3     
14  3     
15  3   NA
16  4   NA
17  4     

For each ID I have a character/ text collected. I can have NA values and/or empty values corresponding to IDs. I would like to create a binary column to present if there is text available to any of the text rows collected for an ID. I am running this code:

df %>% group_by(ID) %>% 
    summarise(text_availabe=if(any(!is.na(text))) 1 else 0)

which populates the following where for ID 3 and ID 4, it treats empty cells as they have text.

     ID text_availabe
  <dbl>         <dbl>
1     1             1
2     2             1
3     3             1
4     4             1

My idea output in this case should be like:

     ID text_availabe
  <dbl>         <dbl>
1     1             1
2     2             1
3     3             0
4     4             0

Thank you very much for your help in advance!

Phil
  • 7,287
  • 3
  • 36
  • 66
Alex
  • 245
  • 1
  • 7

1 Answers1

1

I think the problem you are having is that "NA" is not the same as NA:

library(dplyr)
df %>% 
  group_by(ID) %>% 
  summarize(text_available = any(!text %in% c("", "NA")), .groups = "drop")
!is.na("NA")
[1] TRUE

This is a character string with the letters "N" and "A", so it returns TRUE.

Output

Logical columns are representations of 1 and 0:

TRUE == 1
[1] TRUE

But if you need it to be in 1/0 form, then just wrap it with as.integer: as.integer(any(...)).

     ID text_available
  <dbl> <lgl>         
1     1 TRUE          
2     2 TRUE          
3     3 FALSE         
4     4 FALSE       
LMc
  • 12,577
  • 3
  • 31
  • 43
  • would it be OK if I use `if` instead of `as.integer` like `df %>% group_by(ID) %>% summarise(text_availabe=if(any(!text %in% c("", "NA"))) 1 else 0)` – Alex May 20 '21 at 22:58
  • 1
    Use the vectorized form of `if`, which is `ifelse`. You can do: `ifelse(any(!text %in% c("", "NA")), 1, 0)`. `If` and `else` is reserved for conditions of length 1. – LMc May 20 '21 at 23:00
  • Thanks! well, can you elaborate on vectorized form of `if`. The above code gives me the same output as using `if` and `else` that you are proposing. – Alex May 20 '21 at 23:07
  • 1
    Actually, in this context `if` and `else` works. Because you are creating one logical value using `any` they will return the same thing. Generally, in pipe chains and in `mutate` you are evaluating a condition across many rows resulting in a logical *vector* rather than a single logical value, which is when you would use `if` and `else`. – LMc May 20 '21 at 23:09
  • 1
    Consider reading this [SO question](https://stackoverflow.com/questions/17252905/else-if-vs-ifelse) for a more in-depth answer. – LMc May 20 '21 at 23:10
  • thanks! I read through it and I think I noticed what you meant. Would you mind rating the question if you think it has value. Appreciated! – Alex May 21 '21 at 03:37
  • thanks! the piece of code worked well on the real data which had ~ 780 rows – Alex May 21 '21 at 16:29
  • my apologies, I thought in my mind that I did so yesterday. – Alex May 21 '21 at 17:21