0

To produce a cumulative plot using ggplot2 geom_stat(), I need a data.frame() that has the number of instances of various combinations of factors. I know how to produce the numbers using aggregate(), e.g.

print(aggregate(cbind(count=prop_cost) ~ tax_cnt + data_set, data=out_data, FUN=function(x){NROW(x)}))

Gives me:

  tax_cnt data_set count
1       3    5taxa  1936
2       4    5taxa  3907
3       5    5taxa  7205
4       3  5taxaRS  1446
5       4  5taxaRS  2896
6       5  5taxaRS  6168

But how can I put these values back into the data.frame I am using to plot things? I would like to set a new column, $nt_cnt, so that

df[df$data_set=='5taxa' & df$tax_cnt==1,]$nt_cnt = 1936

and similarly for the other 5 sums.

This seems like it must be easy, but I need help.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Please make the question reproducible so others can easily run it by copying from the question and pasting into their R session. Include all `library` statements and inputs (using `dput`). See top of [tag:r] tag page on how to ask a question. – G. Grothendieck Jan 20 '22 at 19:56
  • If you're trying to summarise a dataframe by taking an average or sum, etc., use ```group_by() %>% summarise()``` from the dplyr package. If you're simply counting how many rows exist for different factors, use the count() function. – Macgregor Aubertin-Young Jan 20 '22 at 19:58
  • @MacgregorAubertin-Young OP says *"how can I put these values back into the data.frame I am using"*, so `mutate` not `summarise`. – Gregor Thomas Jan 20 '22 at 20:00
  • @GregorThomas Yes, thanks for the correction. – Macgregor Aubertin-Young Jan 20 '22 at 20:02
  • 1
    Here's the FAQ [on calculating summary statistics and adding them back to the original data](https://stackoverflow.com/q/6053620/903061). – Gregor Thomas Jan 20 '22 at 20:02
  • 1
    In base R you could `merge(out_data, your_aggregate, all.x = TRUE)`. In `dplyr` you would `out_data %>% group_by(tax_cnt, data_set) %>% mutate(nt_cnt = n())` – Gregor Thomas Jan 20 '22 at 20:05
  • Apparently 'ave()' is my friend in this case, as it makes it easy to aggregate() and merge(). It was certainly not something I would have thought of. – Bill Pearson Jan 20 '22 at 20:23

0 Answers0