0

I have 15 data tables for the years 2005 - 2020 like this:

DT_2005 = data.table(
  ID = c("1","2","3","4","5","6"),
  year = c("2005,"2005","2005","2005","2005","2005")
  score = c("98","89","101","78","97","86")
)

# Data tables for every year...

DT_2020 = data.table(
  ID = c("1","2","4","6","7","8"),
  year = c("2020,"2020","2020","2020","2020","2020")
  score = c("89","79","110","98","74","88")
)

# DT_2020 output
ID, year, score
1, 2020, 89
2, 2020, 79
4, 2020, 110
6, 2020, 98
7, 2020, 74
8, 2020, 88

i.e. there are ID's that don't appear in some years.

I want to combine the tables into a "long" format like this:

ID, year, score
1, 2005, 98
1, 2006, 95
1, 2007, 97
...
1, 2019, 90
1, 2020, 89
2, 2005, 79
2, 2006, 81
...
2, 2019, 83
2, 2020, 79

Is there a way to do this in data.table so that each row is an ID with the years in ascending order and there are no NA rows for ID's that weren't in a certain year?

codemachino
  • 103
  • 9

1 Answers1

0

You can combine all the dataframes in one combined dataframes from the global environment and order the results.

library(data.table)
dt <- rbindlist(mget(paste0('DT_', c(2005:2020))))
dt <- dt[order(ID)]

The equivalent dplyr and base R alternatives are -

#dplyr
library(dplyr)
res <- bind_rows(mget(paste0('DT_', c(2005:2020)))) %>% arrange(ID)


#Base R
res <- do.call(rbind, mget(paste0('DT_', c(2005:2020))))
res <- res[order(res$ID), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213