I have data in the following format (excerpt):
# install.packages("data.table")
# install.packages("tidyverse")
library("data.table")
library("tidyverse")
dt <- data.table(
date = lubridate::as_date(c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04")),
location = c("Westpark", "Northpark", "Estpark", "Southpark"),
'receiver_a: max' = c(20, 30, 25, 15),
'receiver_a: min' = c(10, 15, 20, 5),
'receiver_b: max' = c(15, 45, 10, 50),
'receiver_b: min' = c(15, 45, 10, 50)
)
> dt
date location receiver_a: max receiver_a: min receiver_b: max receiver_b: min
1: 2021-01-01 Westpark 20 10 15 15
2: 2021-01-02 Northpark 30 15 45 45
3: 2021-01-03 Estpark 25 20 10 10
4: 2021-01-04 Southpark 15 5 50 50
I now want to split all columns starting with receiver_...
in the column name and convert them to a long format:
- Split all columns with
receiver_...
in the column name into new columnsreceiver
,max
andmin
. Everything after the:
in the "old" column names starting withreceiver...
is used for the new column names, heremax
andmin
. - The new column
receiver
contains e.g.receiver_a
,receiver_b
etc as values. The new columnsmax
andmin
contain the respective numerical values.
This can be implemented with tidyr::pivot_longer()
:
# dt <- dt %>%
dt %>%
tidyr::pivot_longer(
cols = dplyr::contains(":"),
names_to = c("receiver", ".value"),
names_sep = ": ",
names_repair = "minimal"
)
# A tibble: 8 x 5
date location receiver max min
<date> <chr> <chr> <dbl> <dbl>
1 2021-01-01 Westpark receiver_a 20 10
2 2021-01-01 Westpark receiver_b 15 15
3 2021-01-02 Northpark receiver_a 30 15
4 2021-01-02 Northpark receiver_b 45 45
5 2021-01-03 Estpark receiver_a 25 20
6 2021-01-03 Estpark receiver_b 10 10
7 2021-01-04 Southpark receiver_a 15 5
8 2021-01-04 Southpark receiver_b 50 50
cols = dplyr::contains(":")
: select all columns with:
inside the name, for examplereceiver_a: max
names_to = c("receiver", ".value")
: Split selected columns intoreceiver
and.value
..value
indicates that component of the name defines the name of the column containing the cell valuesnames_sep = ": "
Ifnames_to
contains multiple values, these arguments control how the column name is broken up, here by:
(whitespace after colon)
My question: Can this also be done with a (faster) data.table
solution (e.g. with melt()
)?