0

I have a dataset with a unique ID for groups of patients called match_no and i want to count how many patients got sick in two different years by running a loop function to count the occurrences in a large dataset

for (i in db$match_no){(with(db, sum(db$TBHist16 == 1 & db$match_no == i))}

This is my attempt. I need i to cycle through each of the match numbers and count how many TB occurrences there was.

Can anyone correct my formula please.

Example here

df1 <- data.frame(Match_no = c(1, 1,1,1,1,2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4, 5,5,5,5,5),
                  var1 = c(1,1,1,0,0,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1,1,0,1))

I want to count how many 1 values there are in each match number.

Thank you

Kilian Murphy
  • 321
  • 2
  • 14

2 Answers2

1

Some ideas:

  1. Simple summary of all Match_no values:

    xtabs(~var1 + Match_no, data = df1)
    #     Match_no
    # var1 1 2 3 4 5
    #    0 2 2 1 3 1
    #    1 3 3 4 2 4
    
  2. Same as 1, but with a subset:

    xtabs(~ Match_no, data = subset(df1, var1 == 1))
    # Match_no
    # 1 2 3 4 5 
    # 3 3 4 2 4 
    
  3. Results in a frame:

    aggregate(var1 ~ Match_no, data = subset(df1, var1 == 1), FUN = length)
    #   Match_no var1
    # 1        1    3
    # 2        2    3
    # 3        3    4
    # 4        4    2
    # 5        5    4
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

In base R you can use aggregate and sum:

aggregate(var1 ~ Match_no, data = df1, FUN = sum)

  Match_no var1
1        1    3
2        2    3
3        3    4
4        4    2
5        5    4
Clemsang
  • 5,053
  • 3
  • 23
  • 41