This certainly is a basic question but I cannot figure it out by myself. Please consider the following:
In a large dataset with patient characteristics in long format I want to summarise some variables. I would prefer to use dplyr
for that.
For the example data set:
db <- data.frame(ID = c(rep(1, 3), rep(2,4), rep(3, 2)),
Gender = factor(c(rep("woman", 7), rep("man", 2))),
Grade = c(rep(3, 3), rep(1, 4), rep(2, 2)))
db
# ID Gender Grade
# 1 1 woman 3
# 2 1 woman 3
# 3 1 woman 3
# 4 2 woman 1
# 5 2 woman 1
# 6 2 woman 1
# 7 2 woman 1
# 8 3 man 2
# 9 3 man 2
I would like to make a frequency table for Gender
and Grade
. Obviously, there are 2 female patients and 1 male. Each grade (1:3) occurs once.
I tried:
x <- db %>% group_by(ID, Gender, Grade)
table(y$Gender)
x
# A tibble: 9 x 3
# Groups: ID, Gender, Grade [3]
# ID Gender Grade
# <dbl> <fct> <dbl>
# 1 1. woman 3.
# 2 1. woman 3.
# 3 1. woman 3.
# 4 2. woman 1.
# 5 2. woman 1.
# 6 2. woman 1.
# 7 2. woman 1.
# 8 3. man 2.
# 9 3. man 2.
but when I call for instance table(x$Gender)
, the outcome is:
table(y$Gender)
# man woman
# 2 7
What am I doing wrong?
Thanks a lot in advance!
Edit: The desired output is to have a frequency table of how many male/female participants there are in the dataset, as well as how many patients have grade 1, 2, 3 etc. Please see below.
With the following I can call the percentage of females in db:
db %>%
summarise(pct.female = mean(Gender == "woman", na.rm = T))
# pct.female
# 1 0.7777778
What I would rather need is the amount of males/females (n). Something like this:
# man woman
# 1 2