0

I have a problem performing a fairly simple ddply operation: I have the following dataframe.

+----------+----------+
| Expenses | Category |
+----------+----------+
|      735 |        1 |
|      992 |        2 |
|      943 |        1 |
|      995 |        3 |
|      914 |        3 |
|      935 |        1 |
|      956 |        3 |
|      946 |        2 |
|      978 |        1 |
|      924 |        1 |
+----------+----------+

I am trying to calculate the N and mean of expenses for each category, by executing the following:

ddply(df, .(Category), summarise, N = length(df$Expenses), mean = mean(df$Expenses))

However i get:

  Category  N  mean
1        1 10 931.8
2        2 10 931.8
3        3 10 931.8

Could you help figuring out what I'm doing wrong here?

Here is the df's dput:

structure(list(Expenses = c(735, 992, 943, 995, 914, 935, 956, 
946, 978, 924), Category = c(1L, 2L, 1L, 3L, 3L, 1L, 3L, 2L, 
1L, 1L)), .Names = c("Expenses", "Category"), class = "data.frame", row.names = c(NA, 
-10L))
orestisf
  • 353
  • 5
  • 16
  • 1
    You don't need `df$` inside ddply, so try `ddply(df, .(Category), summarise, N = length(Expenses), mean = mean(Expenses))` – talat Feb 14 '16 at 21:49
  • Thanks, that did it. I completely missed the original question you're pointing me to though, should look better next time. – orestisf Feb 14 '16 at 22:04

1 Answers1

1

Alternate approach with dplyr:

library(dplyr);

grouped_df    <- group_by(df, Category);
summarized_df <- summarize(grouped_df, N    = n(),
                                       mean = mean(Expenses));
summarized_df;
Mekki MacAulay
  • 1,727
  • 2
  • 12
  • 23