5

I have a program that uses reshape2's melt function to melt a 5-dimensional array with named and labelled dimensions to a long-form data frame, which by definition has only two dimensions. Each dimension of the input array corresponds to a column in the output data frame, and there is one more column that holds the values that were stored in the 5D array.

I understand reshape2 is deprecated and will soon break. So I am changing to tidyr. However tidyr's pivot_longer function that replaces melt only accepts 2D data frames as inputs.

Is there a non-deprecated function, in tidyr or elsewhere, that will melt an array with 3 or more named and labelled dimensions to a long form data frame?

I could write my own function to do it easily enough. But I'd rather use an existing function if there is one.

Thank you

Here's an example of 2x3x4 array:

df <- expand.grid(w = 1:2,
                  x = 1:3,
                  y = 1:4)
df$z <- runif(nrow(df))

tmp <- tapply(df$z, list(df$w, df$x, df$y), sum)
tmp
, , 1

           1          2         3
1 0.40276418 0.13111652 0.4473557
2 0.08945365 0.03139184 0.1556355

, , 2

          1          2         3
1 0.1413763 0.02106974 0.1103559
2 0.7302435 0.46302772 0.7924580

, , 3

          1         2         3
1 0.2793435 0.4244807 0.7955351
2 0.9828739 0.7740189 0.6436733

, , 4

          1          2         3
1 0.9852345 0.20508490 0.8744829
2 0.2812744 0.06272449 0.0936831
Ben Toh
  • 742
  • 5
  • 9
Andrew Kirk
  • 2,027
  • 2
  • 11
  • 16
  • 1
    Not sure under what circumstances you would need to squash a 5d array, wouldn't `as.data.frame(5d_array)` convert the array to a 2d data.frame and then you can use `tidyr` (which is the successor of `reshape2` I think) to manipulate it? – Ben Toh Aug 08 '20 at 03:24
  • 1
    Thank you Ben. I tested using as.data.frame and it accepts a 5D input but it hashes together the labels from the removed dims and uses them as column labels, rather than putting them in as extra columns. So a table with dim 2 x 3 x 4 is converted to a data frame with 2 rows and 12 (= 3 x 4) columns. What I want is a table with 24 rows and four columns - three values in a row identifying the cell of the (in this case) 3D table from which the value in the fourth column comes. BTW I need to convert it to a df because I want to use it for graphing in ggplot, which requires a data frame as input – Andrew Kirk Aug 08 '20 at 04:27
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 08 '20 at 04:33
  • 2
    Have you tried `data.table`? `as.data.table(5d_array)`? – A5C1D2H2I1M1N2O1R2T1 Aug 08 '20 at 04:34
  • Also `reshape2` isn't deprecated -- it's retired. So it's not going to disappear, but it's not going to be further updated. If you still need array support that may be your best option. – MrFlick Aug 08 '20 at 04:34
  • Or maybe `as.data.frame(ftable(5d_array))`, though as @MrFlick suggested, a reproducible example would be helpful. – A5C1D2H2I1M1N2O1R2T1 Aug 08 '20 at 04:38
  • 1
    I can confirm that `data.table::as.data.table(3d_array)` as suggested by @A5C1D2H2I1M1N2O1R2T1 works (and added an example of 2x3x4 in the question). – Ben Toh Aug 08 '20 at 04:42

1 Answers1

10

Sticking with base R, you can wrap your array in ftable before using as.data.frame:

set.seed(1); array(sample(100, 2*3*4, TRUE), dim = c(2, 3, 4)) -> a
b <- provideDimnames(a)
b
# , , A
# 
#    A  B  C
# A 27 58 21
# B 38 91 90
# 
# , , B
# 
#    A  B  C
# A 95 63 21
# B 67  7 18
# 
# , , C
# 
#    A  B   C
# A 69 77  72
# B 39 50 100
# 
# , , D
# 
#    A  B  C
# A 39 94 66
# B 78 22 13

as.data.frame(ftable(b))
#    Var1 Var2 Var3 Freq
# 1     A    A    A   27
# 2     B    A    A   38
# 3     A    B    A   58
# 4     B    B    A   91
# 5     A    C    A   21
# 6     B    C    A   90
# 7     A    A    B   95
# 8     B    A    B   67
# 9     A    B    B   63
# 10    B    B    B    7
# 11    A    C    B   21
# 12    B    C    B   18
# 13    A    A    C   69
# 14    B    A    C   39
# 15    A    B    C   77
# 16    B    B    C   50
# 17    A    C    C   72
# 18    B    C    C  100
# 19    A    A    D   39
# 20    B    A    D   78
# 21    A    B    D   94
# 22    B    B    D   22
# 23    A    C    D   66
# 24    B    C    D   13

You can also use as.data.table from the "data.table" package. The following should work:

library(data.table)
as.data.table(b)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485