0

This certainly is a basic question but I cannot figure it out by myself. Please consider the following:

In a large dataset with patient characteristics in long format I want to summarise some variables. I would prefer to use dplyr for that.

For the example data set:

db <- data.frame(ID = c(rep(1, 3), rep(2,4), rep(3, 2)),
                  Gender = factor(c(rep("woman", 7), rep("man", 2))),
                  Grade = c(rep(3, 3), rep(1, 4), rep(2, 2)))
db
#    ID Gender Grade
#  1  1 woman     3
#  2  1 woman     3
#  3  1 woman     3
#  4  2 woman     1
#  5  2 woman     1
#  6  2 woman     1
#  7  2 woman     1
#  8  3   man     2
#  9  3   man     2

I would like to make a frequency table for Gender and Grade. Obviously, there are 2 female patients and 1 male. Each grade (1:3) occurs once.

I tried:

x <- db %>% group_by(ID, Gender, Grade)
  table(y$Gender)
x
# A tibble: 9 x 3
# Groups:   ID, Gender, Grade [3]
#     ID Gender Grade
#  <dbl> <fct>  <dbl>
# 1    1. woman     3.
# 2    1. woman     3.
# 3    1. woman     3.
# 4    2. woman     1.
# 5    2. woman     1.
# 6    2. woman     1.
# 7    2. woman     1.
# 8    3. man       2.
# 9    3. man       2.

but when I call for instance table(x$Gender), the outcome is:

table(y$Gender)

#    man woman 
#      2     7 

What am I doing wrong?

Thanks a lot in advance!

Edit: The desired output is to have a frequency table of how many male/female participants there are in the dataset, as well as how many patients have grade 1, 2, 3 etc. Please see below.

With the following I can call the percentage of females in db:

db %>%
summarise(pct.female = mean(Gender == "woman", na.rm = T))
#    pct.female
# 1  0.7777778

What I would rather need is the amount of males/females (n). Something like this:

# man    woman
#   1        2
Frederick
  • 810
  • 8
  • 28

2 Answers2

2
require(dplyr)
require(magrittr)
db %>% count(ID, Gender) %$% table(Gender)

Or, without dplyr

require(magrittr)
db %$% split(Gender, ID) %>% sapply(unique) %>% table
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
2
require(dplyr)
db %>% group_by(Gender, Grade) %>% tally()

# A tibble: 3 x 3
# Groups:   Gender [?]
  Gender Grade     n
  <fct>  <dbl> <int>
1 man     2.00     2
2 woman   1.00     4
3 woman   3.00     3

# Was also suggested by @konvas in their comment.

will tell you all the unique combinations of Gender and Grade. And how many each of those exist. This what you want? Difficult to say from your question. Desired output would be good.


edit Alternatively, as per requested output:

db %>% distinct(ID, Gender) %>% count(Gender) 

# A tibble: 2 x 2
  Gender `n()`
  <fct>  <int>
1 man        1
2 woman      2
tjebo
  • 21,977
  • 7
  • 58
  • 94
  • I don't understand n in this case entirely. Due to the same ID, the unique combination of man and grade 2 should be 1 instead of 2. Same goes for woman and grade 1 and 3 (only 1 combination each). – Frederick Mar 15 '18 at 09:17
  • please give us your desired output when this answer doesn’t give what you want. There is only one row (!) per unique gender/grade combination, here only one row with “man/ grade2”. the last row shows how often this combination occurs in your data. – tjebo Mar 15 '18 at 09:21
  • Modifying Tjebo's suggestions gives me desired output: `x <- db %>% group_by(Gender, Grade) %>% tally()` `table(x$Gender)` `# man woman # 1 2` My confusion was that according to the ID there are only 2 female and 1 male. Although the n in the tibble gives the amount of combination in my dataset, this combination should be 1 for each row because there are only three observations in total (according to the IDs). – Frederick Mar 15 '18 at 09:32
  • added another option. – tjebo Mar 15 '18 at 11:59