Aggregate based on year and ID in R

Question

I would like to aggregate my costs by year and by ID.

Here is some example data:

   ID <- c(1,1,1,1,2,2,3,3)
   year <- c(1,2,2,2,3,3,3,3)
   cost <- c(1,1,2,3,2,2,2,2)

   data = cbind(ID, year, cost)

This information should be saved in additional column so costs_year1, costs_year2, costs_year3 by ID. Then, I would drop the other columns and remove the duplicate IDs so I have a wide dataframe.

Any suggestions to do this neatly?

moodymudskipper · Answer 1 · 2018-07-30T10:03:28.690

2

Using tidyverse:

library(tidyverse)
ID <- c(1,1,1,1,2,2,3,3)
year <- c(1,2,2,2,3,3,3,3)
cost <- c(1,1,2,3,2,2,2,2)

data = data.frame(ID, year, cost)

data %>%
  mutate(year = paste0("costs_year",year)) %>%
  group_by(year,ID) %>%
  summarize_at("cost",sum) %>%
  spread(year,cost)

# # A tibble: 3 x 4
#      ID costs_year1 costs_year2 costs_year3
# * <dbl>       <dbl>       <dbl>       <dbl>
# 1     1           1           6          NA
# 2     2          NA          NA           4
# 3     3          NA          NA           4

%>% is called the pipe operator, it's from package magrittr and you can use it (for example) after attaching tidyverse with library(tidyverse).

Using pipes you can use the output of the preceding instruction as a first parameter of your next call, but examples will teach you better. Here's how to make it work without pipes:

x <- mutate(data, year = paste0("costs_year",year))
x <- group_by(x,year,ID)
x <- summarize_at(x,"cost",sum)
spread(x,year,cost)

For more info : What does %>% mean in R

edited Jul 30 '18 at 10:03

answered Jul 30 '18 at 09:37

moodymudskipper

46,417
11
121
167

I think he wants to have sum rather than mean, which could be obtained by replacing `sum` with `mean` as you would also tell. Great `tidy` answer! – OzanStats Jul 30 '18 at 09:49
1

Je vous en prie – OzanStats Jul 30 '18 at 09:53
What does this %>% mean? And yes, I meant sum not mean – Student Jul 30 '18 at 09:54
I edited my answer, let me know if something is still unclear – moodymudskipper Jul 30 '18 at 10:03

score 0 · Answer 2 · answered Jul 30 '18 at 10:19

Using dcast() of reshape2 package.

library(reshape2)
df.wide <- dcast(df1, ID ~ year, sum)
names(df.wide) <- c("ID", paste0("costs.year.", 1:3))

Or in one step:

df.wide <- setNames(dcast(df1, ID ~ year, sum), c("ID", paste0("costs.year.", 1:3)))

Yielding

> df.wide
  ID costs.year.1 costs.year.2 costs.year.3
1  1            1            6            0
2  2            0            0            4
3  3            0            0            4

Data

df1 <- structure(list(ID = c(1, 1, 1, 1, 2, 2, 3, 3), year = c(1, 2, 
                                                                2, 2, 3, 3, 3, 3), cost = c(1, 1, 2, 3, 2, 2, 2, 2)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                          -8L))

Aggregate based on year and ID in R

2 Answers2