How to count the different number of variables in a column, then list that count by numbers in another column

Question

Please see attached image for the best way I can describe my question.

I promise I did attempt to research this first, and I saw a few answers that fit close, but many of them required listing off each variable (in this image, this would be each encounter #), and my data has approximately 15 million lines of code, with about 10,000 different encounter #'s.

question

I would appreciate any assistance!

Please add a [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example so that it is easy for people to help you. — Ronak Shah, Sep 12 '16 at 04:39
In the future, @BJack, please post data and code, not pictures. I recognize this seems to do a decent job of explaining what you want (*input* and *desired output* are good), it could easily have been done using `dput` (perhaps using something like [`clipr`](https://cran.r-project.org/web/packages/clipr/index.html) to retrieve from your Excel page). That would have allowed us to use *your* example data instead of making us come up with something representative on our own. — r2evans, Sep 12 '16 at 15:02

score 2 · Answer 1 · answered Sep 12 '16 at 06:23

As an alternative, you can also use the data.table package. Especially on large datasets, data.table will give you an enormous performance boost. Applied to the data as used by @r2evans:

library(data.table)
setDT(df)[, .(n_uniq_enc = uniqueN(encounter)), by = patient]

this will lead to the following result:

   patient n_uniq_enc
1:     123          5
2:     456          5

r2evans · Answer 2 · 2016-09-12T14:56:58.387

Lacking a reproducible example, here's some sample data:

set.seed(42)
df <- data.frame(patient = sample(c(123,456), size=30, replace=TRUE), encounter=sample(c(12,34,56,78,90), size=30, replace=TRUE))
head(df)
#   patient encounter
# 1     456        78
# 2     456        90
# 3     123        34
# 4     456        78
# 5     456        12
# 6     456        90

Base R:

aggregate(x = df$encounter, by = list(patient = df$patient),
          FUN = function(a) length(unique(a)))
#   patient x
# 1     123 5
# 2     456 5

or (by @20100721's suggestion):

aggregate(encounter~.,FUN = function(t) length(unique(t)),data = df)

Using dplyr:

library(dplyr)
group_by(df, patient) %>%
  summarize(numencounters = length(unique(encounter)))
# # A tibble: 2 x 2
#   patient numencounters
#     <dbl>         <int>
# 1     123             5
# 2     456             5

Update: @2100721 informed me of n_distinct, effectively same as length(unique(...)):

group_by(df, patient) %>%
  summarize(numencounters = n_distinct(encounter))

Thanks, @2100721 ... `dplyr` certainly has a lot of little swiss-army-knife nail files sitting around, doesn't it? I wonder how many are orthogonally useful versus convenience with or without strong "requirements". — r2evans, Sep 12 '16 at 14:58
Another dplyr: df %>% distinct(patient, encounter) %>% count(patient) — TJ Mahr, Sep 12 '16 at 15:28

How to count the different number of variables in a column, then list that count by numbers in another column

2 Answers2