So I have a large dataset with 89 variables where multiple are unique identifiers attributing data in a relational DB. I want to see the frequency of unique identifiers as cross referenced by a second variable which is a factor?
i.e. this does not work but is how I think would work -
length(unique(data$PID ~ data$ICD_grouping)
returning a table like
ICD_grouping unique.PID
C43 5
C47/C49 1
C50 2
C56 1
C57-C58 1
C80 1
Sample data
PID ICD_Grouping
1 1 C80
2 918 C43
3 919 C43
4 919 C43
5 1284
6 1285
7 550 C43
8 550 C43
9 550 C43
10 550 C50
11 920 C43
12 920 C43
13 921 C50
14 921 C56
15 921 C57-58
16 921 C57-58
17 549 C43
18 549 C43
19 922 C47/49
20 551 C43