How to describe unique values of grouped observations for several vars?

Question

I have a tibble where each patient can be observed several times. So names are like this : id_patient (num) ; id_eval (num) ; treat_1 (logical) ; treat_2 (logical) ; treat_1_type (char) ; treat_2_type (char).

What I want : a summary table (with tbl_summary) describing unique values to know how many patients have, at least 1 time, be concerned by a possibility. Something like this :

var	All patients (n=N)
treat_1	AA (aa %)
treat_2	BB (bb %)
treat_1_type
- Type_1	CC (cc %)
- Type_2	DD (dd %)
treat_2_type
- Type_1	EE (ee %)
- Type_2	FF (ff %)
- Type_3	GG (gg %)

What I have for now is :

evals %>%
    group_by(id_patient) %>%
    select(id_patient, treat_1, treat_2) %>%
    summarise(across(everything(), .fns = unique))
    summary()

But that gives me all existing TRUE/FALSE combinations, so it does not represent really unique values. And this is for the logical part so the easy one, it will not work with factors...

How do you think I can achieve that ?

It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Sep 05 '21 at 13:35

Marek Fiołka · Accepted Answer · 2021-09-05T12:47:34.383

I wish you had given us a bit of data. But let's produce them ourselves.

library(tidyverse)

n=10
evals = tibble(
  id_patient = sample(1:50, n, replace = T),
  id_eval = sample(120:277, n),
  treat_1 = sample(c(T, F), n, replace = T),
  treat_2 = sample(c(T, F), n, replace = T),
  treat_1_type = sample(c("Type_1", "Type_2"), n, replace = T),
  treat_2_type = sample(c("Type_1", "Type_2", "Type_3"), n, replace = T)
)

evals

output

# A tibble: 10 x 6
   id_patient id_eval treat_1 treat_2 treat_1_type treat_2_type
        <int>   <int> <lgl>   <lgl>   <fct>        <fct>       
 1         42     237 TRUE    FALSE   Type_2       Type_3      
 2         24     240 FALSE   FALSE   Type_1       Type_1      
 3         10     236 TRUE    FALSE   Type_1       Type_3      
 4         27     153 TRUE    FALSE   Type_1       Type_2      
 5         29     126 TRUE    FALSE   Type_2       Type_1      
 6         18     194 FALSE   TRUE    Type_1       Type_2      
 7         18     215 TRUE    FALSE   Type_2       Type_2      
 8         48     205 TRUE    FALSE   Type_1       Type_3      
 9         12     131 FALSE   FALSE   Type_1       Type_2      
10         13     225 FALSE   FALSE   Type_2       Type_3

Is it okay? I hope so. Now let's do a summary as you like.

seval = evals %>%
  group_by(id_patient) %>%
  summarise(
    treat_1 = sum(treat_1)>0,
    treat_2 = sum(treat_2)>0,
    treat_1_Type_1 = sum(treat_1_type=="Type_1")>0,
    treat_1_Type_2 = sum(treat_1_type=="Type_2")>0,
    treat_2_Type_1 = sum(treat_2_type=="Type_1")>0,
    treat_2_Type_2 = sum(treat_2_type=="Type_2")>0,
    treat_2_Type_3 = sum(treat_2_type=="Type_3")>0
  ) %>% summarise(
    treat_1 = sum(treat_1),
    treat_2 = sum(treat_2),
    treat_1_Type_1 = sum(treat_1_Type_1),
    treat_1_Type_2 = sum(treat_1_Type_2),
    treat_2_Type_1 = sum(treat_2_Type_1),
    treat_2_Type_2 = sum(treat_2_Type_2),
    treat_2_Type_3 = sum(treat_2_Type_3)
  )

output

# A tibble: 1 x 7
  treat_1 treat_2 treat_1_Type_1 treat_1_Type_2 treat_2_Type_1 treat_2_Type_2 treat_2_Type_3
    <int>   <int>          <int>          <int>          <int>          <int>          <int>
1       6       1              6              4              2              4              4

Now you can easily calculate the proportions

seval %>% 
  pivot_longer(everything(), names_to = "var", values_to = "val") %>% 
  group_by(var) %>% 
  mutate(prop = val/length(unique(evals$id_patient)))

output

# A tibble: 7 x 3
# Groups:   var [7]
  var              val  prop
  <chr>          <int> <dbl>
1 treat_1            6 0.667
2 treat_2            1 0.111
3 treat_1_Type_1     6 0.667
4 treat_1_Type_2     4 0.444
5 treat_2_Type_1     2 0.222
6 treat_2_Type_2     4 0.444
7 treat_2_Type_3     4 0.444

I tested everything for both chr and factor variables and everything works fine.

How to describe unique values of grouped observations for several vars?

1 Answers1

Linked