0

I have the following dataframe dat:

        > dat
  subjectid variable
1      1234 12
2      1234 14
3      2143 19
4      3456 12
5      3456 14
6      3456 13

How do I add another column which shows the count of each unique subjectid?

ddply(dat,.(subjectid),summarize,quan_95=quantile(variable,0.95),uniq=count(unique(subjectid)))
zx8754
  • 52,746
  • 12
  • 114
  • 209
user3006691
  • 435
  • 3
  • 7
  • 16

3 Answers3

1

Similar to ave(), you may also use split/lapply/unsplit:

i = split(dat$variable, dat$subjectid)
count = unsplit(lapply(i, length), dat$subjectid)

Then graft the count variable back using data.frame() or whatever your preferred method.


The split() function just creates a list of dat$variable values for each value of dat$subjectid. The count is found by using lapply() to apply the length() function over each index in the list (i) and unsplit() puts everything back in order.

unsplit() is pure magic and fairy dust. I didn't believe it the first 100 times.

1

Here is an approach via dplyr. First we group by subjectid, then use the function n() to count number of rows in each group:

dat <- read.table(text="
subjectid variable
1      1234 12
2      1234 14
3      2143 19
4      3456 12
5      3456 14
6      3456 13")

library(dplyr)

dat %>%
  group_by(subjectid) %>%
  mutate(count = n())

  subjectid variable count
1      1234       12     2
2      1234       14     2
3      2143       19     1
4      3456       12     3
5      3456       14     3
6      3456       13     3
AndrewMacDonald
  • 2,870
  • 1
  • 18
  • 31
0

If dat is ordered by subjectid

   tbl <- table(dat[,1])
    transform(dat, count=rep(tbl, tbl))
   # subjectid variable count
   #1      1234       12     2
   #2      1234       14     2
   #3      2143       19     1
   #4      3456       12     3
   #5      3456       14     3
   #6      3456       13     3
akrun
  • 874,273
  • 37
  • 540
  • 662