0

I am trying to create a frequency table returning the number of unique diagnoses per ID but some IDs were assigned the same diagnosis code multiple times. Here is how my data looks like:

EncounterID  Diagnosis
    1            a
    1            a
    1            c
    2            a
    2            b
    3            e
    3            e 
    3            e

Here is what I would like my frequency table to look like:

EncounterID  Diagnosis
    1            2
    2            2
    3            1

I am relatively new to R, I figured the first part of the code. Here is the code I have tried, but it is not returning the number of unique diagnoses.

Num_Diag_Per_Enct <- data.table(table(diagn$ENCOUNTER_ID))

Ems
  • 7
  • 3

1 Answers1

0

Try this. You can use group_by() and summarise() from dplyr or aggregate() from base R. Here the code:

library(dplyr)
#Code
newdf <- df %>% group_by(EncounterID) %>% summarise(Diagnosis=n_distinct(Diagnosis))

Output:

# A tibble: 3 x 2
  EncounterID Diagnosis
        <int>     <int>
1           1         2
2           2         2
3           3         1

Or the base R option:

#Code 2
newdf <- aggregate(Diagnosis~EncounterID,data=df,function(x) length(unique(x)))

Output:

  EncounterID Diagnosis
1           1         2
2           2         2
3           3         1

Some data used:

#Data
df <- structure(list(EncounterID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L
), Diagnosis = c("a", "a", "c", "a", "b", "e", "e", "e")), class = "data.frame", row.names = c(NA, 
-8L))
Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    I decided to go with code 2 and got the result I was expecting. Thanks for the reply! – Ems Nov 17 '20 at 00:55