3

While answering this question I tried to use the patterns functionality together with data.table::melt:

df1 <- structure(list(Material_code = 111:112, 
                      actual_202009 = c(30L, 19L), 
                      actual_202010 = c(44L, 70L), 
                      actual_202011 = c(24L, 93L), 
                      pred_202009 = c(25L, 23L), 
                      pred_202010 = c(52L, 68L), 
                      pred_202011 = c(27L, 100L)), 
                 class = c("data.table", "data.frame"), 
                 row.names = c(NA, -2L))

I wanted to convert that wide table to a long one, separating the actual and pred values. I thought the date part was going to be conserved in the variable, but it was changed to factor numbers (1, 2, etc):

melt(df1, 1, measure = patterns(actual = "actual_", pred = "pred_"))[1, ]
>    Material_code variable actual pred
> 1:           111        1     30   25

I wanted "202009" instead of "1" in the variable field.

I am fully aware I can achieve it the long way, using tstrsplit:

melt(df1, 1)[, 
           c("type", "variable") := tstrsplit(variable, "_", fixed = TRUE)][,
                 dcast(.SD, Material_code + date ~ type)]

but I was expecting that with patterns I could produce a less verbose way.

What I've tried:

  • I changed variable.factor = FALSE (default is true), but it produces the same values in the variable column (1, 2, ...), only this time they are strings.

Expected output:

I expect the following code to produce the shown output (of course replacing argument1 and argument2 with whatever is appropriate):

melt(df1, 1, measure = patterns(actual = "actual_", pred = "pred_"), argument1, argument2)[1, ]
>    Material_code variable actual pred
> 1:           111   202009     30   25

Is it possible, or definitely one has to go the long way?

PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • 1
    https://github.com/Rdatatable/data.table/issues/4056#issuecomment-700410374 and https://github.com/Rdatatable/data.table/issues/2551 – r2evans Jan 28 '21 at 20:38
  • 1
    Looks like there's a branch [`melt-custom-variable`](https://github.com/Rdatatable/data.table/tree/melt-custom-variable/man) (mentioned here https://github.com/Rdatatable/data.table/issues/2551#issuecomment-704674133) that might resolve this, unsure of its status or timeline. – r2evans Jan 28 '21 at 20:44
  • Related: [Convert numeric representation of 'variable' column to original string following melt using patterns](https://stackoverflow.com/questions/41883573/convert-numeric-representation-of-variable-column-to-original-string-following) – Henrik Jan 30 '21 at 13:36

1 Answers1

7

With data.table 1.14.1 (dev version as of 2021-05-18) it is possible to solve it using the newly incorporated measure function. Like this:

melt(df1, measure.vars= measure(value.name, date, pattern="(actual|pred)_(.*)"))

   Material_code   date actual pred
1:           111 202009     30   25
2:           112 202009     19   23
3:           111 202010     44   52
4:           112 202010     70   68
5:           111 202011     24   27
6:           112 202011     93  100

Check ?measure as well as the release news for more information.

PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • 2
    I like this answer and the solution. I am not a fan of the fact that almost two years later, it is still in dev (currently tagged as 1.14.9, several hotfix releases in the interim). I'm grateful to the `data.table` team for their efforts, I wonder when they will be comfortable with the new features (`measure()`, `env=`, `DT()`, and countless fixes) to release to CRAN. – r2evans Mar 08 '23 at 16:25