1

I want to sum a value by week. Sometimes the first or last week will have less than 7 days. In the example below the data starts with 2016-01-01, but the floor date for that week is 2015-12-27. So the weekly sum is based on two days instead of seven. I understand that this behaviour is completely logical, but i would like, that the first and last week (that might consist of less than 7 days of data) don´t show as low values in the plot. How can i do this? Should i omit the first and last week? Should i use an average value here? How?

expenses <- data_frame(
  date=seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
  amount=rgamma(length(date), shape = 2, scale = 20))

plot_df <-  expenses %>% 
  mutate(Week = floor_date(date, "week")) %>%  
  group_by(Week) %>% 
  summarize(exp_sum = sum(amount))

ggplot(data = plot_df, 
       aes(x = as.Date(Week), y = exp_sum)) + 
  geom_line() +
  geom_point() + 
  scale_x_date(date_breaks = "1 week", date_labels = "%W")

Plot Example

Niels
  • 150
  • 3
  • 11
  • you can get week number as in https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates and then aggregate – abhiieor Dec 20 '17 at 10:13
  • Would get week number increase the problem? I mean, since different years are in the data.frame this produces `"53" "01" "02" "03"..."48" "49" "50"` – Tito Sanz Dec 20 '17 at 11:03

1 Answers1

1

As the periods do not include the same number of days my first recommendation would be to delete them, for this you should only select your database minus the first and last line. This is really simple and it is done in a line.

plot_df <- plot_df[-c(1,nrow(plot_df)),]

The second way would be to add the average value of all the values. However, this should be reflected in the results.

plot_df[c(1,nrow(plot_df)),"exp_sum"] <- mean(plot_df$exp_sum)

My last try is to assign the value that is after or before it:

plot_df[1,"exp_sum"] <- plot_df[2, "exp_sum"]
plot_df[nrow(plot_df), "exp_sum"] <- plot_df[nrow(plot_df)-1, "exp_sum"]

As I told you, I would erase them.

Tito Sanz
  • 1,280
  • 1
  • 16
  • 33