-1

I would like to aggregate my costs by year and by ID.

Here is some example data:

   ID <- c(1,1,1,1,2,2,3,3)
   year <- c(1,2,2,2,3,3,3,3)
   cost <- c(1,1,2,3,2,2,2,2)

   data = cbind(ID, year, cost)

This information should be saved in additional column so costs_year1, costs_year2, costs_year3 by ID. Then, I would drop the other columns and remove the duplicate IDs so I have a wide dataframe.

Any suggestions to do this neatly?

recnac
  • 3,744
  • 6
  • 24
  • 46
Student
  • 73
  • 8

2 Answers2

2

Using tidyverse:

library(tidyverse)
ID <- c(1,1,1,1,2,2,3,3)
year <- c(1,2,2,2,3,3,3,3)
cost <- c(1,1,2,3,2,2,2,2)

data = data.frame(ID, year, cost)

data %>%
  mutate(year = paste0("costs_year",year)) %>%
  group_by(year,ID) %>%
  summarize_at("cost",sum) %>%
  spread(year,cost)

# # A tibble: 3 x 4
#      ID costs_year1 costs_year2 costs_year3
# * <dbl>       <dbl>       <dbl>       <dbl>
# 1     1           1           6          NA
# 2     2          NA          NA           4
# 3     3          NA          NA           4

%>% is called the pipe operator, it's from package magrittr and you can use it (for example) after attaching tidyverse with library(tidyverse).

Using pipes you can use the output of the preceding instruction as a first parameter of your next call, but examples will teach you better. Here's how to make it work without pipes:

x <- mutate(data, year = paste0("costs_year",year))
x <- group_by(x,year,ID)
x <- summarize_at(x,"cost",sum)
spread(x,year,cost)

For more info : What does %>% mean in R

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

Using dcast() of reshape2 package.

library(reshape2)
df.wide <- dcast(df1, ID ~ year, sum)
names(df.wide) <- c("ID", paste0("costs.year.", 1:3))

Or in one step:

df.wide <- setNames(dcast(df1, ID ~ year, sum), c("ID", paste0("costs.year.", 1:3)))

Yielding

> df.wide
  ID costs.year.1 costs.year.2 costs.year.3
1  1            1            6            0
2  2            0            0            4
3  3            0            0            4

Data

df1 <- structure(list(ID = c(1, 1, 1, 1, 2, 2, 3, 3), year = c(1, 2, 
                                                                2, 2, 3, 3, 3, 3), cost = c(1, 1, 2, 3, 2, 2, 2, 2)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                          -8L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110