0

I have a dataframe that contains a number columns which are all coded as factor variables. Each column is based on Questions with only two choices: 1=yes and 2=no, or missing. Each row would be a participant.

Here a simplified version:

Q_1  Q_2  Q_3
  1    1    1
  2    1    1
  1    2    NA
  2    1    2

Ideally, I would like to create an overview dataframe with each question as row and the counts of how often a variable each factor occurred. That would also allow me to use dplyrs mutate function ands calculate the percentages etc.

I would like a frame with the count data:

      Yes  No  NA
  Q_1   2   2   0
  Q_2   3   1   0
  Q_3   2   1   1

I initially though of simply using group_by and the count function, however there is no real grouping variable, because the factor levels (which happen to be the same for all columns) would be my grouping variable.

Rasul89
  • 588
  • 2
  • 5
  • 14
  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data, e.g. with `dput()` and use the [reprex-package](https://reprex.tidyverse.org/). – mnist Sep 15 '21 at 15:28

1 Answers1

1
library(tidyverse)  

df %>% 
  pivot_longer(cols = everything()) %>% 
  mutate(value = if_else(value == 1,"Yes","No")) %>% 
  count(name,value) %>% 
  pivot_wider(names_from = value,values_from = n) %>% 
  replace(is.na(.),0)

# A tibble: 3 x 4
  name     No   Yes  `NA`
  <chr> <int> <int> <int>
1 Q_1       2     2     0
2 Q_2       1     3     0
3 Q_3       1     2     1
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32