5

This just popped into my head,

Let's take this example from a recent question:

data:

df1<-
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 
2016L, 2016L), Category = c("a", "1", "2", "3", "1", "2", "3", 
"1"), Value = c(2L, 3L, 2L, 1L, 7L, 2L, 1L, 1L)), row.names = c(NA, 
-8L), class = "data.frame")

code:

aggregate( Value ~ Year + c(MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]), data=df1, FUN=sum )

current output: (look at the long ugly name of the new var)

#  Year c(MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]) Value
#1 2015                                                   OneTwo     3
#2 2016                                                   OneTwo     1
#3 2015                                                    three     5
#4 2016                                                    three    10

desired output:

#  Year MY_NAME Value
#1 2015  OneTwo     3
#2 2016  OneTwo     1
#3 2015   three     5
#4 2016   three    10

please note:

  • One could (possibly should) declare a new variable.
  • This question is about how to set the name of the new variable DIRECTLY by adding code to the one-liner in code: section.
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
  • Good question, I keep bumping into it. And I haven't solved it, [I defined a new variable](https://stackoverflow.com/questions/52991630/how-to-sum-values-in-rows-that-have-target-values-in-two-columns/52991793#52991793) – Rui Barradas Oct 25 '18 at 14:52

2 Answers2

5

Instead of c, we need cbind, which results in a matrix of one column with column name 'MY_NAME' while c gets a named vector with unique names (make.unique) of the 'MY_NAME'

aggregate( Value ~ Year +
   cbind(MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]), data=df1, FUN=sum )
#  Year MY_NAME Value
#1 2015  OneTwo     3
#2 2016  OneTwo     1
#3 2015   three     5
#4 2016   three    10

In the ?aggregate, it is mentioned about the usage of cbind in the formula method

formula - a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).


An option with tidyverse would be

library(dplyr)
df1 %>% 
      group_by(Year, MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]) %>%
      summarise(Value = sum(Value))
akrun
  • 874,273
  • 37
  • 540
  • 662
4

1) aggregate.data.frame Use aggregate.data.frame rather than aggregate.formula:

by <- with(df1, 
  list(
    Year = Year, 
    MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]
  )
)
aggregate(df1["Value"], by, FUN = sum)

giving:

  Year MY_NAME Value
1 2015  OneTwo     3
2 2016  OneTwo     1
3 2015   three     5
4 2016   three    10

2) 2 step It might be a bit cleaner to split this into two parts (1) create a new data frame in which Category is transformed and (2) perform the aggregate.

df2 <- transform(df1, MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1])
aggregate(Value ~ Year + MY_NAME, df2, sum)

2a) or expressing (2) in terms of a magrittr pipeline:

library(magrittr)

df1 %>%
  transform(MY_NAME = c("OneTwo", "three")[Category %in% 1:2 + 1]) %>%
  aggregate(Value ~ Year + MY_NAME, ., sum)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341