0

I have a dataframe looking like this:

   Model            Month    Country Price
1  Audi TT          2016-03  NL      187
1  Audi TT          2017-03  NL      220
1  Audi TT          2016-03  DE      190
4  Volkswagen Golf  2016-08  NL      204

Now, I want to aggregate the data over the countries such that I only have one observation for each model at a specific date. To do this, I use:

DT = data.table(test_data)
test_data_total = DT[, lapply(.SD, mean), by = Month]

from Can dplyr summarise over several variables without listing each one?.

However, this only works when dealing with numeric variables. In my case the Model variable is a character so this gives me an error, while I want to print only the model name once then. So afterwards it should look like this:

Model            Month    Country Price
1  Audi TT          2016-03  NL      avg
1  Audi TT          2017-03  NL      220
4  Volkswagen Golf  2016-08  NL      204

Does someone know how to do this?

Activation
  • 93
  • 6

2 Answers2

1

Continuing with data.table, try out:

library(data.table)
setDT(test_data)[, .(Country = list(Country), Price = mean(Price)),
                 by = list(Model, Month)][]
# output
             Model   Month Country Price
1:         Audi TT 2016-03   NL,DE 188.5
2:         Audi TT 2017-03      NL 220.0
3: Volkswagen Golf 2016-08      NL 204.0

Data

test_data <- structure(list(Model = c("Audi TT", "Audi TT", "Audi TT", "Volkswagen Golf"
), Month = c("2016-03", "2017-03", "2016-03", "2016-08"), Country = c("NL", 
"NL", "DE", "NL"), Price = c(187L, 220L, 190L, 204L)), .Names = c("Model", 
"Month", "Country", "Price"), class = "data.frame", row.names = c(NA, 
-4L))
nghauran
  • 6,648
  • 2
  • 20
  • 29
  • In my real case I have over 30 variables which I want to average. Do you maybe also know a faster way to perform this action than using the command `Var = mean(Var)` for every variable? – Activation Nov 15 '18 at 09:55
  • You can try something like `cols <- c("Price", "other cols", "..."); setDT(test_data)[, lapply(.SD, mean), .SDcols = cols, by = list(Model, Month)][] ` – nghauran Nov 21 '18 at 15:39
0

Try:

test_data_total <- DT %>% 
  group_by(Model, Month) %>%
  summarise(Country = toString(unique(Country)),
            Price_avg = mean(Price))
Cleland
  • 349
  • 1
  • 6