3

is this the most straightforward way to convert an array into a data.table?

require(data.table)
require(ggplot2)

# this returns a data.table with both array's dimensions and values
aaa <- array(rnorm(3*4*2), dim = c(3,4,2))
DT1 <- as.data.table(as.data.frame.table(aaa))

# the following does not work properly, because it only returns the array values
DT2 <- as.data.table(aaa)


# plot values aggregated by 3rd array dim
ggplot(DT1, aes(Var1, Freq, fill = Var3)) + geom_boxplot()
# sum values by 2nd array dim
DT1[ , sum(Freq), Var2]

EDIT1: sorry, with "properly" I mean that I get a data frame with one column only, so that I don't know from which position in the original array a values has originated. The idea is to transform the array into a flat table, so that is easier to e.g. plot the variables using the dimensions as factors, or to aggregate values by factors. Would that be still possible with DT2?

EDIT2: one other useful thing would be to convert the data.table back into the original array. Do you know a function that coerces data.table to array, by defining which columns to use as dimensions?

aaa <- array(rnorm(3*4*2), dim = c(3,4,2), list(Var1 = LETTERS[1:3], Var2 = LETTERS[1:4], Var3 = LETTERS[1:2] ))

DT1 <- setDT(melt(aaa))

# convert DT1 back to aaa
array(data = DT1[ ,value],
      dim = c(length(unique(DT1[ ,Var1])),
              length(unique(DT1[ ,Var2])),
              length(unique(DT1[ ,Var3]))),
      dimnames = list(Var1 = unique(DT1[ ,Var1]),
                      Var2 = unique(DT1[ ,Var2]),
                      Var3 = unique(DT1[ ,Var3])))

thanks!

Sara
  • 465
  • 5
  • 15
  • both approaches essentially return the same `data.table` but with `A=1`, `B=2`, `C=3` in your second approach, and rows ordered in different ways. so the second approach is more concise – stas g Jul 09 '18 at 14:26
  • 1
    define "properly" – s_baldur Jul 09 '18 at 14:32
  • I clarified in the post, thank you! – Sara Jul 09 '18 at 15:23
  • 4
    `res = setDT(melt(aaa))`? – Frank Jul 09 '18 at 15:57
  • 1
    Note: @Frank unfortunately, `setDT(melt(aaa))` is discouraged now. It returns the following: `Warning: The melt generic in data.table has been passed a matrix and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(croparea.ls.32[["cftarea32"]]). In the next version, this warning will become an error.` – Sara Dec 31 '20 at 13:07
  • 1
    See also https://stackoverflow.com/questions/62213639/fast-melt-large-2d-matrix-to-3-column-data-table – Sara Dec 31 '20 at 13:08
  • Thanks, @Sara good to know. Fwiw, I might still use `reshape2::melt` per that message despite the package being deprecated (since it is deprecated in favor of tidyr which drops functionality for working with arrays; and despite the deprecation, it is apparently still being maintained for compatibility) – Frank Jan 01 '21 at 21:21

3 Answers3

4

only works for versions 1.11.4 and 1.11.2 but not for some earlier versions

both approaches essentially return the same data.table but with A=1, B=2, C=3 in your second approach, and rows ordered in different ways. so the second approach is the way to go.

DT2 <- as.data.table(aaa)
head(DT2)
#   V1 V2 V3       value
#1:  1  1  1  0.32337516
#2:  1  1  2  1.59189589
#3:  1  2  1 -1.48751756
#4:  1  2  2 -0.86749305
#5:  1  3  1  0.01017255
#6:  1  3  2  2.66571093

#compare
DT[order(Freq), ]
#and 
DT2[order(value), ]
stas g
  • 1,503
  • 2
  • 10
  • 20
  • Sorry, it does not seem to work for me. I cannot see where "value" comes from. DT2 seems to have only one column called aaa. I've edited the post for better explanation. Thanks. – Sara Jul 09 '18 at 15:22
  • @Sarah check `head(DT2)`, `value` is one of the columns created in the conversion. my `DT2` is a data.table with four columns! – stas g Jul 09 '18 at 15:34
  • Your code doesn't work for me either, fwiw. As OP says "I get a data frame with one column only" with data.table 1.10.4-3 – Frank Jul 09 '18 at 16:00
  • 2
    it works for both versions 1.11.4 and 1.11.2 on my two machines, so i can only assume `as.data.table` behaves differently in earlier versions – stas g Jul 09 '18 at 16:11
0

depening on your desired output (since you are tryning to convert multiple dimension into a 'flat' table), here is a possible solution using the plyr-package:

plyr's adply takes a array, and retuns a data.frame that you can easily convert to a data.table

library(plyr)
dt <- setDT( adply( aaa, c(1,2) ) )

    X1 X2         V1          V2
 1:  1  1 -0.5869804  1.30996405
 2:  2  1  1.3398003  1.87641841
 3:  3  1 -0.3268114 -0.12771971
 4:  1  2  0.8966523 -1.38669407
 5:  2  2 -0.4612773 -1.48036434
 6:  3  2 -0.6798351 -0.09369933
 7:  1  3  0.1311092  0.40458169
 8:  2  3 -1.7098850  0.39616792
 9:  3  3 -0.4589561 -1.14020015
10:  1  4  0.5348955 -0.25779528
11:  2  4  0.7099319  0.19067120
12:  3  4 -0.1545822 -0.75378610
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • thanks @Wimpel for pointing at setDT! It seems to have big advantages in terms of memory use (no copy is made). – Sara Jul 10 '18 at 07:12
  • 1
    Plyr is pretty old and slow compared to data.table and dplyr functions. Even Hadley (the developer who wrote both plyr and dplyr) says dplyr is is intended to succeed plyr. Frank's comment data.table answer is a better and faster way to go. – Neal Barsch Jul 16 '18 at 02:43
0

convert the data.table back into the original array

Here a quick and dirty solution

DT2 = as.data.table(aaa)
aaa2= array(dim = dim(aaa))
invisible(DT2[, aaa2[V1, V2, V3] <<- value, .(V1,V2,V3)])
all.equal(aaa,aaa2) # TRUE
Jinglestar
  • 376
  • 1
  • 10