0

I have a dataframe listed below that has a multiple entries by date. I would like to calculate an average income by item in each day. Output needs to be in the dataframe since I would like to use it in ggplot. Whatever I do I get out the data which are the same in all rows of the dataframe regarding the date.

dataframe:
        quantity pricereal  tip   length  name     date      average
2           2        12.66 Typ-3      2m Typ-3 2m 2015-08-03  8.351814
3           1         6.87 Typ-3      2m Typ-3 2m 2015-08-03  8.351814
7           1        10.62 Typ-6      2m Typ-6 2m 2015-08-03  8.351814
49          1        12.61 Typ-4      2m Typ-4 2m 2015-08-04  8.351814
50          4        10.62 Typ-6      2m Typ-6 2m 2015-08-04  8.351814
61          2         9.14 Typ-1      2m Typ-1 2m 2015-08-05  8.351814
62          3         4.41 Typ-1      2m Typ-1 2m 2015-08-05  8.351814  

the average I got out is clearly wrong. I wrote the following syntax:

data_alu$average <- NA
data_alu$average <- mean(data_alu$pricereal)

I think the solution is with tapply however i get out an error message because there are more rows coresponding with one date.

data_alu$average  <-tapply(data_alu$date, data_alu$pricereal, mean)

just to clerify I would like a mean of each day.. not a mean of all the data

Hopefully there is a saviour out there...

1 Answers1

0

Here's a base R solution... You almost had it with tapply. by is a wrapper for tapply that I find intuitive. Then get the data in a dataframe and merge them.

df <- read.table(textConnection('       quantity pricereal  tip   length  name  length   date      average
2           2        12.66 Typ-3      2m Typ-3 2m 2015-08-03  8.351814
3           1         6.87 Typ-3      2m Typ-3 2m 2015-08-03  8.351814
7           1        10.62 Typ-6      2m Typ-6 2m 2015-08-03  8.351814
49          1        12.61 Typ-4      2m Typ-4 2m 2015-08-04  8.351814
50          4        10.62 Typ-6      2m Typ-6 2m 2015-08-04  8.351814
61          2         9.14 Typ-1      2m Typ-1 2m 2015-08-05  8.351814
62          3         4.41 Typ-1      2m Typ-1 2m 2015-08-05  8.351814  '),
                 stringsAsFactors=FALSE)
tmp <- by(df$pricereal, df$date, mean)
df2 <- data.frame(date=names(tmp),
                  mean=as.numeric(tmp),
                  stringsAsFactors=FALSE)
df$avg <- df2$mean[match(df$date, df2$date)]
cory
  • 6,529
  • 3
  • 21
  • 41
  • 2
    yay, base-R! But isn't something like `dat$average <- ave(dat$pricereal, dat$date)` easier? saves you from having to create two extra dataframes. – Heroka Feb 26 '16 at 15:11
  • 1
    wow, didn't even know `ave` existed... Does it do the match correctly? If so, that's awesome. – cory Feb 26 '16 at 15:25