0

I have a somewhat simple question (i think), but can't seem to find a solution. I would like to create a new dataframe that groups and sums multiple variables.

My data:

ID  Test    result  ped adult
AB  a   0   0   1
AB  b   1   0   1
FM  a   1   1   0
FM  c   0   1   0
WD  a   0   0   1
WD  b   1   0   1
WD  c   0   0   1
WD  d   1   0   1
WD  a   0   0   1
WD  a   1   0   1

The output I would like:

Test    No of ID's test with test performed     No of ID's positive     ped adult
a   3   2   1   1
b   2   2   0   2
c   2   0   0   0
d   1   1   0   1

I have tried using aggregate and dplyr using group_by and sum, but have not had success.

NB: edited to add ped and adult columns. I would like to sum the positive tests and then have a sum for ped and adult positive tests.

sar
  • 182
  • 6
  • 26
  • 3
    `df %>% group_by(Test) %>% summarise(nos = n_distinct(ID), pos = sum(result)) ` – Ronak Shah Sep 24 '18 at 14:49
  • thank you @Ronak Shah. That looks good – sar Sep 24 '18 at 14:54
  • In the original dataset, I also have a column labeled "ped" 0/1 and "adult 0/1. If wanted to add two further columns, number of ped and number of adult positive, how could I do this? ie. pos column = 5, ped =1, adult = 4. I will edit the above post – sar Sep 24 '18 at 15:01

1 Answers1

1

We can use aggregate from R base

> aggregate(result~Test, data=df1, function(x) c(N = length(x), Sum=sum(x)))
  Test result.N result.Sum
1    a        5          2
2    b        2          2
3    c        2          0
4    d        1          1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138