1

Python guy new to R, so forgive the naive question.

I have an R dataframe named metrics with four columns:

I want to pass the level of aggregation (day or week) as a variable to dcast for aggregation.

agg_level <- c("week")

If I hard-code week in the in the function, it aggregates data for each week correctly:

  • met <- dcast(metrics, week ~ city, value.var = count, fun.aggregate = sum)
  • Output:

week NYC CHI SF

2015-10-18 1 2 3

2015-10-25 4 5 6

If I replace week with the variable, it fails. (It aggregates data for all weeks.)

  • met <- dcast(metrics, agg_level ~ city, value.var = count, fun.aggregate = sum)

  • Output:

agg_level NYC CHI SF

week 5 7 9

Based on this, metrics[[agg_level]] extracts a column from variable, but this fails:

  • met <- dcast(m, [[agg_level]] ~ city, value.var = metric, fun.aggregate = sum)

  • Error in (function ... unexpected '[['

What is the correct way to do this?

Community
  • 1
  • 1
lmart999
  • 6,671
  • 10
  • 29
  • 37

1 Answers1

3

The formula argument of dcast expects that the words passed to it are column/variable names inside of the data.frame x. It does not recognize or resolve the fact that "agg_level" is a variable. As such, you have two options:

# Option 1
# Do some text operations to make the formula based on variables.
if(this==that) {agg_level <- 'week'} else {agg_level <- 'day'}
myFormula <- sprintf("%s ~ city", agg_level)
met <- dcast(metrics, as.formula(myFormula), sum, value.var = metric)

# Option 2 - Untested
# Take advantage of dcast's alternative to the formula notation and pass a list instead.
# No idea if this will work.
met <- dcast(metrics, list(.(agg_level),.(city)), sum, value.var=metric)
doicomehereoften1
  • 537
  • 1
  • 4
  • 12
  • #2 error: `Error in inherits(x, "formula") : could not find function "."` – lmart999 Nov 25 '15 at 21:02
  • #1 resolution: `met <- dcast(metrics, as.formula(sprintf("%s ~ city", agg_level)), sum, value.var = metric)` – lmart999 Nov 25 '15 at 21:03
  • Figures... I'm not going to bother tracing that down. You should also look into dplyr, it's a lot faster than plyr and is intended to operate via pipelines, which might be a bit more familiar for you, coming from Python. – doicomehereoften1 Nov 25 '15 at 21:05
  • Also, I just saw your combining of dcast, as.formula, and sprintf into a one-liner. Yes, it's more compact. However, there are cases where in R where doing multiple operations in a single line is actually slower than doing them all seperately. Weird, I know, but that's R for you. – doicomehereoften1 Nov 25 '15 at 21:07
  • Thanks. Ya, I've been looking at `dplyr `. Nice package. FWIW, `dcast` is from `reshape` library; interested if `dplyr` has function similar to `dcast` ... Will look into that. – lmart999 Nov 25 '15 at 21:08
  • Ah. Good to know about splitting operations into multiple lines :) – lmart999 Nov 25 '15 at 21:09
  • It doesn't exactly. It's a bit more close to working in something like SQL. An equivalent would use both `dplyr` and `tidyr` packages: `met <- metrics %>% group_by(week, city) %>% summarize(metric=sum(metric)) %>% spread(city, metric)` – doicomehereoften1 Nov 25 '15 at 21:19
  • Nice. Just got `tidyr `. May change the logic, as you suggest, depending on performance. – lmart999 Nov 25 '15 at 21:28