0

I have a data frame as below

cust start-dt    end-dt      item item_type sales cost trans-dat
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 1500  1400 08-01-2019
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 2000  1600 09-01-2019
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 2500  1000 07-12-2019
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 1600  1300 05-01-2020
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 2400  1700 02-01-2020
A    07-01-2019  07-01-2020  AA   xxxxxxxxx 2200  1300 04-01-2020

I need to convert to year wise data as below. based on the customer and the item they bought across every year

cust start-dt    end-dt      item item_type  year sales cost
A    07-01-2019  07-01-2020  AA   xxxxxxxxx  2019 6000  4000
A    07-01-2019  07-01-2020  AA   xxxxxxxxx  2020 6200  4300

i tried melt and cast but melt doesnt create new columns/rows.

this data is only an example. I have multiple customers and multiple items for every customer and the start and end also differs for every cust.

Please guide me on how to approach the issue.

  • Is the `year` column the year of the `start-dt`, `end-dt`, or `trans-dat`? And you want to summarize the `sales` and `cost` columns? By summing them? – Gregor Thomas May 05 '20 at 12:22
  • The year column is the years between start-dt and end-dt. Consider it like a membership for the customer. but the sales and cost are summed based on the trans-dt which is basically the transaction date. The trans-dt will always be between the start-dt and the end-dt. – Sruthy Sivakumar May 05 '20 at 15:01

1 Answers1

1

Looks like the year in the result is from the trans-dt column. So you need to extract that year and do a grouped sum. Here's a dplyr method:

library(dplyr)
df %>% 
  mutate(year = as.integer(substr(`trans-dt`, 7, 10))) %>% # chars 7 to 10 as the year
  group_by(cust, `start-dt`, `end-dt`, item, item_type, year) %>%
  summarize(sales = sum(sales), cost = sum(cost))

See the FAQ on summing by group for additional info.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294