1

I have a daily time series containing weekdays, excluding weekends. I would like to reorder it so that each column represents a week and the subsequent five rows show the data for Mon, Tue, Wed, Thur and Fri of that week.

I tried using cast (package: reshape) but am having problems in obtaining the above.

Thank you for your help.

example:

Date        Day Value
06/01/2020  mon 15
07/01/2020  tue 16
08/01/2020  wed 17
09/01/2020  thu 18
10/01/2020  fri 19
13/01/2020  mon 20
14/01/2020  tue 21
15/01/2020  wed 22
16/01/2020  thu 23
17/01/2020  fri 24

to be reformated to:

Start of week   mon tue wed thu fri
06/01/2020      15  16  17  18  19
13/01/2020      20  21  22  23  24
Jaap
  • 81,064
  • 34
  • 182
  • 193
adam.888
  • 7,686
  • 17
  • 70
  • 105

2 Answers2

2

Here is an example using the tidyverse package. I also used lubridate to convert the Date from character to date class.

The key is to use pivot_wider from the tidyr package to convert the data to the wide format.

library(tidyverse)
library(lubridate)

dat2 <- dat %>%
  # Convert W to factor for ordering
  mutate(Day = factor(Day, levels = c("mon", "tue", "wed", "thu", "fri"))) %>%
  # Create a goruping variable to show the week number
  group_by(Day) %>%
  mutate(Group = 1:n()) %>%
  ungroup() %>%
  # Change the Date based on Group
  group_by(Group) %>%
  mutate(Date = min(dmy(Date))) %>%
  # Convert to wide format
  pivot_wider(names_from = Day, values_from = Value) %>%
  # Remove Group
  ungroup() %>%
  select(-Group)

dat2
# # A tibble: 2 x 6
#   Date         mon   tue   wed   thu   fri
#   <date>     <int> <int> <int> <int> <int>
# 1 2020-01-06    15    16    17    18    19
# 2 2020-01-13    20    21    22    23    24

DATA

# Create example data frame
dat <- read.table(text = "Date Day Value

'06/01/2020' mon 15

'07/01/2020' tue 16

'08/01/2020' wed 17

'09/01/2020' thu 18

'10/01/2020' fri 19

'13/01/2020' mon 20

'14/01/2020' tue 21

'15/01/2020' wed 22

'16/01/2020' thu 23

'17/01/2020' fri 24",
                  header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
2

Another option with the -package:

library(data.table)

# convert to a 'data.table'
# set the 'Date' and 'Day' columns in the right format
setDT(mydf)[, `:=` (Date = as.Date(Date, format = "%d/%m/%Y"),
                    Day = factor(Day, levels = c("mon","tue","wed","thu","fri")))]

# create a 'start_of_week' column
# transform from long to wide format
res <- mydf[, start_of_week := Date[1], by = cumsum(Day == "mon")
            ][, dcast(.SD, start_of_week ~ Day, value.var = "Value")]

which gives:

> res
   start_of_week mon tue wed thu fri
1:    06/01/2020  15  16  17  18  19
2:    13/01/2020  20  21  22  23  24

Used data:

mydf <- read.table(text="Date        Day Value
06/01/2020  mon 15
07/01/2020  tue 16
08/01/2020  wed 17
09/01/2020  thu 18
10/01/2020  fri 19
13/01/2020  mon 20
14/01/2020  tue 21
15/01/2020  wed 22
16/01/2020  thu 23
17/01/2020  fri 24", header=TRUE, stringsAsFactors=FALSE)
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Thank you for your solution. I have data that has missing values and I get an error message: Aggregate function missing, defaulting to 'length'. Is there any way to deal with missing values? – adam.888 Feb 16 '20 at 17:59
  • 1
    @adam.888 That's a result of having more than one observation (=row) per day (and is not caused by missing data); [see here for an explanation](https://stackoverflow.com/a/33051521/2204410). You could use `fun.aggregate = sum` inside `dcast` to sum the values of a day. – Jaap Feb 16 '20 at 18:28