2

I am trying to convert some data from daily level to a weekly sum in order to build some time series models etc. I am able to extract the week number from the date and groupby to get a weekly sum. However, when I try to convert back to a datetime object in my aggregate dataframe (so I can build a line chart that aligns with some other data I have for the same timeframe) R gives the correct year number and today's date?

Here's some code:

#Example data. The actual data is daily for 5 years
dates = c("2011-01-29", "2011-01-30", "2011-01-31", "2011-02-01", "2011-02-02", "2011-02-03", "2011-02-04", "2011-02-05", "2011-02-06", "2011-02-07")
sales = c(53, 46, 42, 45, 28, 48, 32, 53, 52, 26)

# Example data frame
df = tibble(date = dates, sales = sales)
#format date, appears to work just fine
df$date = as.Date(df$date)
print(class(df$date))
print(df$date)

# Ok so now the tricky bit. First, create a string column that has week number 
#and year to perform the groupby on
df$week = strftime(df$date, format="%U-%Y")
print(df$week)
# This appears to work fine. Week numbers are correct, 
#I don't really care about Sunday/Monday stuff, so long as its consistent

#Groupby the new column
df_weekly = df %>% 
  group_by(week) %>% 
  summarise(sales = sum(sales))

#Now to convert the week back to a date format so R will recognize it when I plug it into ggplot. 
#neither of these approaches appear to work. Both get the correct year, but give today's date? 
# not only am I confused why the exact formatting used earlier doesn't work, but I have not asked for today's date anywhere I don't think...
df_weekly$week2 = strptime(df_weekly$week, format = "%U-%Y")
df_weekly$week3 = as.Date(df_weekly$week, format = "%U-%Y")

print(df_weekly$week2)
print(df_weekly$week3)

Any explanation of what R is doing here and why I'm being foolish would be most appreciated!

Doug Ziman
  • 21
  • 1
  • 1
    You can't directly convert a week to a date because a week contains multiple dates. I'm surprised you're getting these confusing results instead of an error. For a solution, I'd suggest this duplicate: [Transform year/week to Date object](https://stackoverflow.com/q/45549449/903061). – Gregor Thomas Oct 02 '22 at 16:48
  • (...which would propose a solution such as `as.Date(paste0(df_weekly$week, "-1"), format = "%U-%Y-%u")`, adding a day-of-week number to the input.) – Gregor Thomas Oct 02 '22 at 16:52
  • Brilliant. Makes perfect sense now that you say it that way. I wish there was an error instead, because honestly I probably could have found that for myself if there wasn't a functional result that made me think I had the format wrong... Thanks – Doug Ziman Oct 02 '22 at 17:04

0 Answers0