count frequency of variable dependent on other variable in an R dataframe

Question

df <- data.frame(samples = c('45fe.K2','45fe.K2','45fe.K2','45hi.K1','45hi.K1'),source = c('f','f','o','o','f'))
df
  samples   sou
1 45fe.K2      f
2 45fe.K2      f
3 45fe.K2      o
4 45hi.K1      o
5 45hi.K1      f

I want to count how many of the samples are from the sou f or o.

The result should look like this

samples      sou count
1 45fe.K2      f 2
3 45fe.K2      o 1
4 45hi.K1      o 1
5 45hi.K1      f 1

I have tried this

df <- df  %>%
  group_by(sou) %>%
  mutate(count = n_distinct(samples)) %>%
  ungroup()

df <- within(df, { count <- ave(sou, samples, FUN=function(x) length(unique(x)))})

df$count <- ave(as.integer(df$samples), df$sou, FUN = function(x) length(unique(x)))

df$count <- with(df, ave(samples,sou, FUN = function(x) length(unique(x))))

All of these count only the unique samples (which is 2) or the unique amount of sou(which is 2). But I want to know how many unique sous are in the unique samples.

Duck · Accepted Answer · 2020-09-16T14:42:53.140

Try this dplyr solution with summarise() and n():

library(dplyr)
df %>% group_by(samples,source) %>% summarise(N=n())

Output:

# A tibble: 4 x 3
# Groups:   samples [2]
  samples source     N
  <chr>   <chr>  <int>
1 45fe.K2 f          2
2 45fe.K2 o          1
3 45hi.K1 k          1
4 45hi.K1 o          1

And a base R solution would be creating a indicator variable N with ones and then aggregate():

#Data
df$N <- 1
#Code
aggregate(N~samples+source,df,sum)

Output:

  samples source N
1 45fe.K2      f 2
2 45hi.K1      k 1
3 45fe.K2      o 1
4 45hi.K1      o 1

count frequency of variable dependent on other variable in an R dataframe

1 Answers1