I am trying to convert some data from daily level to a weekly sum in order to build some time series models etc. I am able to extract the week number from the date and groupby to get a weekly sum. However, when I try to convert back to a datetime object in my aggregate dataframe (so I can build a line chart that aligns with some other data I have for the same timeframe) R gives the correct year number and today's date?
Here's some code:
#Example data. The actual data is daily for 5 years
dates = c("2011-01-29", "2011-01-30", "2011-01-31", "2011-02-01", "2011-02-02", "2011-02-03", "2011-02-04", "2011-02-05", "2011-02-06", "2011-02-07")
sales = c(53, 46, 42, 45, 28, 48, 32, 53, 52, 26)
# Example data frame
df = tibble(date = dates, sales = sales)
#format date, appears to work just fine
df$date = as.Date(df$date)
print(class(df$date))
print(df$date)
# Ok so now the tricky bit. First, create a string column that has week number
#and year to perform the groupby on
df$week = strftime(df$date, format="%U-%Y")
print(df$week)
# This appears to work fine. Week numbers are correct,
#I don't really care about Sunday/Monday stuff, so long as its consistent
#Groupby the new column
df_weekly = df %>%
group_by(week) %>%
summarise(sales = sum(sales))
#Now to convert the week back to a date format so R will recognize it when I plug it into ggplot.
#neither of these approaches appear to work. Both get the correct year, but give today's date?
# not only am I confused why the exact formatting used earlier doesn't work, but I have not asked for today's date anywhere I don't think...
df_weekly$week2 = strptime(df_weekly$week, format = "%U-%Y")
df_weekly$week3 = as.Date(df_weekly$week, format = "%U-%Y")
print(df_weekly$week2)
print(df_weekly$week3)
Any explanation of what R is doing here and why I'm being foolish would be most appreciated!