Get unique values of separate column after aggregate

Question

I'm using aggregate to get the mean of a value as determined by three other values. I'd also like to get a count of how many values met those three criteria.

For example, I have df:

df <- data.frame(id = c(1,1,1,2,2,2,3,3,3,3),
             col1=sample(1:10, 10, replace = TRUE),
             col2 = c(1,2,1,2,1,1,2,2,2,1), 
             col3 = c("a","b","c", "b","a","a","b","a","c","a"),
             col4 = c("yes","no","no", "no","yes","yes","yes","no","no","yes"))

And I run aggregate to get the mean for each unique occurence, like:

df_agg <- aggregate(col1~col2+col3+col4, df, FUN = mean)

What I want to know is that in addition to the mean of rows that have col2 =1, col3=a, col4=yes that there are also 3 unique id values that meet the criteria. Basically a sample size or n number, but of the unique values. I tried doing df_agg <- aggregate(id~col2+col3+col4, df, FUN = length), but this gives me the total count of id rows, I just want the unique ones. Something like this, an example of one row from my sample data that would have more than 1 unique id.

An example output for a line that would have more than unique ID: col2 col3 col4 mean count |---------|--------|---------|--------|--------| | 1 | a | yes | 5 | 3 | |---------|--------|---------|--------|--------|

Can you show your expected output? It is not clear.. Did you meant `aggregate(id~col2+col3+col4, df, FUN = function(x) length(unique(x)))` — akrun, Feb 07 '18 at 00:59
It is difficult to understand from the comments. Please update your post — akrun, Feb 07 '18 at 01:09
I get the mean for the row you showed as 6.25. Please use correct values instead of some guesses — akrun, Feb 07 '18 at 01:18
Isn't col1 randomly generated, so the mean will be different each time? — Ariel Kaputkin, Feb 07 '18 at 01:25

Get unique values of separate column after aggregate

0 Answers0