I'm relatively new to R, so forgive me if this seems like a dumb question. I've started to run out of ideas from other examples on how to make this work, and I was hoping someone could help guide me in the right direction to get it working.
So I'm attempting to do a count distinct on SITE_ID to CLNCL_TRIAL_ID.
My data is actually in a dataframe (data2), but this is kind of what it looks like:
CLNCL_TRIAL_ID:
89794,
89794,
8613,
8613
SITE_ID:
12456,
12456,
100341,
30807
The Idea that my end result would be like count of 89794=1 and 8613=2
Here's what I have so far:
z <- aggregate(data2$SITE_ID ~ data2$CLNCL_TRIAL_ID, data2, function(SITE_ID) length(unique(data2$SITE_ID)))
and I've attempted some alternate forms
aggregate(SITE_ID ~ CLNCL_TRIAL_ID, data2, sum(!duplicated(data$SITE_ID)))
aggregate(SITE_ID ~ CLNCL_TRIAL_ID, data2, nlevels(factor(data2$SITE_ID)))
aggregate(SITE_ID ~ CLNCL_TRIAL_ID, data2, function(SITE_ID) length(unique(data2$SITE_ID)))
I keep running into the problem that instead of grouping by trial_ID, it is counting for the whole table. so 89794=3 and 8613=3.
Does anyone have an idea how to correct this issue? I feel like i'm overlooking something silly. Also, as a side note: I'm trying to keep this limited to the base package of R if at all possible. If it isn't possible, no biggie.