0

I have data in long format (one line has a specific date, ID and several variables - see code below) and I would like to build an array in R from it.

test_df <- data.frame("dates"=c(19801230,19801231,19801231,19810101), "ID"=c(101,101,102,102), "var1"=0:3, "var2"=5:8)

If I focus on a single variable only, I can always create a wide table, having a row for each date and a column for each ID reporting the relative value; but I would like to build automatically an array out of it, so to have all variables in one object where I can work with an ordered time dimension.

In the example of test_df, I would like to obtain two tables binded together (an array), where the first table has values of var1, the second one of var2 but both tables have dates 19801230 19801231 and 19810101 as row indeces and 101 and 102 column indeces, which allows them to be binded together in an array (with NAs in missing values).

I could run a lapply by the ID indeces or by the date indeces and marge the output lists into an array, but it seems complicated to make dimensions match (different IDs are present in different dates). Do you have suggestions?

The only other close question I have seen around is the other way around here, but it did not help me much.

oibaFox
  • 75
  • 6

1 Answers1

1
nm1 <- grep("var", names(test_df), value = TRUE)
test_df[nm1][test_df[nm1] == 0] <- -999
out <- simplify2array(lapply(nm1, \(x) xtabs(test_df[[x]]~ dates + ID, 
   data = test_df[c("dates", "ID")])))
out[out ==0] <- NA
out[out == -999] <- 0
out

-output

> out
, , 1

          ID
dates      101 102
  19801230   1  NA
  19801231   2   3
  19810101  NA   4

, , 2

          ID
dates      101 102
  19801230   5  NA
  19801231   6   7
  19810101  NA   8

Or with tidyverse

library(dplyr)
library(tibble)
library(tidyr)
library(purrr)
test_df %>%
  pivot_longer(cols = starts_with('var')) %>% 
  pivot_wider(names_from = ID, values_from = value) %>%
  {split(.[setdiff(names(.), "name")], .$name)} %>% 
  map(~ .x %>% 
    column_to_rownames('dates') %>% 
    as.matrix) %>% 
  simplify2array

-output

, , var1

         101 102
19801230   1  NA
19801231   2   3
19810101  NA   4

, , var2

         101 102
19801230   5  NA
19801231   6   7
19810101  NA   8
akrun
  • 874,273
  • 37
  • 540
  • 662
  • beautiful, thanks. made my question look pretty silly :D – oibaFox Nov 04 '22 at 15:25
  • is there a way to get NAs instead of zero, which might be a problem? with `sparse = T` inside `xtabs` I get close, but not quite there. – oibaFox Nov 04 '22 at 15:51
  • I am sorry, but `out[out ==0] <- NA` can only work if I have not 0s, can't it? I will modify my question to take this into account – oibaFox Nov 04 '22 at 15:57
  • This is alright since I can always find a value that is not among the original values to substitute the 0s at the beginning. However, having dozens of milions of lines, it can get clumsy: do you have a straighter/alternative way to do that, taking even this into consideration? Much appreciated! – oibaFox Nov 04 '22 at 16:05
  • what if values are characters? shold I open a new question for this or should I modify this one? – oibaFox Nov 05 '22 at 12:38