2

I have a dataframe including around 300 columns. These columns are grouped. Here is the example:

id  name fixed_2020 fixed_2019 fixed_2018 fixed_2017 fixed_2016 current_2020  current_2019 current_2018 current_2017 current_2016
 1     A       2300       2100       2600       2600       1900         3000          3100         3200         3300         3400

I have around 20000 rows in total in the dataframe. And I would like to reshape this dataframe from wide to long in R. I tried to use function melt:

fixed <- melt(mydata, id.vars = c('id', 'name'), measure.vars = 3:7, variable.name = 'year', value.name = 'fixed')

Then I use gsub to get the year column

fixed$year <- as.character(gsub("[^0-9-]", "", debtors$year))

Here is what I want:

id   name  year  fixed  current
 1   A     2020  2030    3000
 2   A     2019  2100    3100
 3   A     2018  2600    3200
 4   A     2017  2600    3300
 5   A     2016  1900    3400

While it does give me what I want, the process is time-consuming when I have more than 300 columns. Since my dataframe is grouped according to the variables and years (10 years for each variable), I wonder whether there are other functions to do it more efficiently.

Thanks a lot in advance!

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
cara
  • 79
  • 4
  • 1
    Package [`data.table`](https://CRAN.R-project.org/package=data.table) is meant for performance. See the first answer to [this SO post](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) and also [this other answer](https://stackoverflow.com/a/57939670/8245406) with a [`tidyr::pivot_longer`](https://tidyr.tidyverse.org/reference/pivot_longer.html) solution. – Rui Barradas May 06 '22 at 07:32
  • 1
    `mydata |> pivot_longer(3:12, names_to = "temp") |> separate(temp,into = c("statut","year"), sep = "_") |> pivot_wider(names_from = "statut", values_from = "value")` using `dplyr` and `tidyr` – MonJeanJean May 06 '22 at 07:44

3 Answers3

4

You can use pivot_longer() from tidyr:

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(contains("_"), names_to = c(".value", "year"), names_sep = "_") %>%
  group_by(name) %>%
  mutate(id = 1:n()) %>%
  ungroup()

# # A tibble: 5 x 5
#      id name  year  fixed current
#   <int> <chr> <chr> <int>   <int>
# 1     1 A     2020   2300    3000
# 2     2 A     2019   2100    3100
# 3     3 A     2018   2600    3200
# 4     4 A     2017   2600    3300
# 5     5 A     2016   1900    3400

Data
df <- structure(list(id = 1L, name = "A", fixed_2020 = 2300L, fixed_2019 = 2100L, 
    fixed_2018 = 2600L, fixed_2017 = 2600L, fixed_2016 = 1900L,
    current_2020 = 3000L, current_2019 = 3100L, current_2018 = 3200L,
    current_2017 = 3300L, current_2016 = 3400L), class = "data.frame", row.names = c(NA, -1L))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2

Using data.table:

library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'), 
                 measure.vars = patterns(fixed='fixed_', current='current_'), 
                 variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
##    id name year fixed current
## 1:  1    A 2020  2300    3000
## 2:  2    A 2019  2100    3100
## 3:  3    A 2018  2600    3200
## 4:  4    A 2017  2600    3300
## 5:  5    A 2016  1900    3400

This should be very fast but your data-set is not very big tbh.

Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020 as the first fixed_* column, there is also a current_2020 as the first current_* column, and so on. Otherwise, the year column will correctly associate with fixed but not current

jlhoward
  • 58,004
  • 7
  • 97
  • 140
1

A base R option using reshape

transform(
    reshape(
        df,
        direction = "long",
        idvar = c("id", "name"),
        sep = "_",
        varying = -c(1:2)
    ),
    id = seq_along(id)
)

gives

         id name time fixed current
1.A.2020  1    A 2020  2300    3000
1.A.2019  2    A 2019  2100    3100
1.A.2018  3    A 2018  2600    3200
1.A.2017  4    A 2017  2600    3300
1.A.2016  5    A 2016  1900    3400
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81