0

I have a simple data frame as follows.

table1 <- setNames(data.frame(matrix(ncol = 3, nrow = 0)), c("ID","Gender","Smoke"))

table1[1,]=c("1","M","N")
table1[2,]=c("1","F","N")
table1[3,]=c("2","M","N")
table1[4,]=c("2","F","S")
table1[5,]=c("1","M","N")
table1[6,]=c("1","F","N")

  ID Gender Smoke
1  1      M     N
2  1      F     N
3  2      M     N
4  2      F     S
5  1      M     N
6  1      F     N

I would like to add two columns. The first column would be a count of how many instances the ID in column 1 has appeared. In this case, ID "1" appears 4 times, and ID "2" appears 2 times, so I would want it to show 4 for rows 1:2,5:6 and 2 for rows 3:4.

I would like the second column to represent exactly which instance of the particular ID that has occurred in the row. For this, row 1 is the first instance of "1" so it is assigned 1, while row 2 is the second instance of "1" so it would be assigned 2. Row 3 is the first instance of "2" so it is assigned 1, and likewise we get 2 for row 4, 3 for row 5, and 4 for row 6.

In the end the table would look something like this:

  ID Gender Smoke IDTotal IDCount
1  1      M     N       4       1
2  1      F     N       4       2
3  2      M     N       2       1
4  2      F     S       2       2
5  1      M     N       4       3
6  1      F     N       4       4
pfoe
  • 3
  • 1

1 Answers1

0

Using data.table:

df <- read.table(text = "ID Gender Smoke
1  1      M     N
                 2  1      F     N
                 3  2      M     N
                 4  2      F     S
                 5  1      M     N
                 6  1      F     N", header = T)

library(data.table)
setDT(df)
df[, IDTotal := .N, by = ID]
df[, IDCount := 1:.N, by = ID]

    ID Gender Smoke IDTotal IDCount
1:  1      M     N       4       1
2:  1      F     N       4       2
3:  2      M     N       2       1
4:  2      F     S       2       2
5:  1      M     N       4       3
6:  1      F     N       4       4
sm925
  • 2,648
  • 1
  • 16
  • 28