8

I have a large matrix num [1:62410, 1:48010]

I want this in a long format data.table

e.g.

   Var1 Var2     value
1:    1    1 -4227.786
2:    2    1 -4211.908
3:    3    1 -4197.034
4:    4    1 -4183.645
5:    5    1 -4171.692
6:    6    1 -4161.634

minimal example

m = matrix(1:5, nrow = 1000, ncol = 1000)
x = data.table(reshape2::melt(m))

ideally I'd want the columns names x, y and value at the same time.

Previously I've been using data.table(melt(mymatrix)). But judging by the warnings that reshape2::melt is deprecated, this is probably not optimal in terms of speed, what would be the best "data.table" way of solving this?

the following do not answer my question: Fast melted data.table operations Proper/fastest way to reshape a data.table

Other answers refer to the deprecated reshape2 package

BetaScoo8
  • 429
  • 4
  • 9

4 Answers4

7

Here's an example:

# example matrix
m = matrix(1:12, nrow = 4)

# load data table
library(data.table)

We can extract the data, row and column info directly and it should be pretty fast:

dt = data.table(
  row = rep(seq_len(nrow(m)), ncol(m)), 
  col = rep(seq_len(ncol(m)), each = nrow(m)), 
  value = c(m)
)

The result is:

    row col value
 1:   1   1     1
 2:   2   1     2
 3:   3   1     3
 4:   4   1     4
 5:   1   2     5
 6:   2   2     6
 7:   3   2     7
 8:   4   2     8
 9:   1   3     9
10:   2   3    10
11:   3   3    11
12:   4   3    12
talat
  • 68,970
  • 21
  • 126
  • 157
3

There is a as.data.table method for array which will do the trick for you.

dim(m) <- c(dim(m), 1L)
as.data.table(m)

In future, when posting questions on SO, please provide minimal example.

I now looked at the source of it and I see it may not be very efficient, because it materializes all NA values, and then removes them.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • 1
    Isn't melting a `data.table` faster? `d = as.data.table(m)`; `d[ , ri := 1:.N]`; `melt(d, id.vars = "ri")`. Test on e.g. `nr = 1e4`; `nc = 1e4`; `m = matrix(1:(nr*nc), nrow = nr)` – Henrik Jun 05 '20 at 12:02
  • 1
    Henrik, that results in character column called variable, not the output I'm after. (I'd need to strip out the "V" and do a type conversion) – BetaScoo8 Jun 05 '20 at 13:44
  • 1
    jangorecki, thank you but at first glance this is about 2-3 times slower than the reshape2::melt approach. – BetaScoo8 Jun 05 '20 at 13:50
  • Good to know, I am aware it is not really optimized for speed. This does not come up often. If you wish, feel free to submit FR in data.table repository to speed that up. – jangorecki Jun 05 '20 at 14:38
  • 1
    @BetaScoo8 Sorry, I was sloppy when reading your desired output :) Try: `dimnames(m) <- list(NULL, 1:ncol(m))`; `d = as.data.table(m)`; `d[ , ri := 1:.N]`; `d2 = melt(d, id.vars = "ri", variable.name = "ci")`; `d2[ , ci := as.integer(ci)]`. Despite the two additional steps (`dimnames` & `as.integer`), I think it's still faster. – Henrik Jun 05 '20 at 18:18
1

2 Years Later...

Since reshape2 is retired and I didn't want to use tidyr (or anything other than data.table and base functions), I ended up with this solutions:

# binds row and column names from matrix with their values
cbind(
  # We want a data.table as result
  as.data.table(
    # Cartesian product of matrix row and column names
    # returns a data.frame with 2 column
    expand.grid(rownames(m), colnames(m))
  ),
  # Store matrix value into a vector:
  # first all values from 1st column, then 2nd, and so on
  value = as.vector(m)
)

NOTE: if your matrix column or rows are not named or if you just don't care about it, you should replace expand.grid(...) with this line:

expand.grid(c(1:nrow(m)), c(1:ncol(m)))

names are relevant when you have, for example, a correlation matrix.

Yan Foto
  • 10,850
  • 6
  • 57
  • 88
0

A while ago I've run into the same problem as @BetaScoo8 and asked a similar question (see here). As pointed out by @jangorecki as.data.table "melts" array but not matrix (2D).

# 2D matrix
> AR <- array(1:12, dim = c(3,4))
> DT <- as.data.table(AR)
> print(DT) # Note: no "value" column, matrix str is preserved!
   V1 V2 V3 V4
1:  1  4  7 10
2:  2  5  8 11
3:  3  6  9 12

# 3D array
> AR <- array(1:24, dim=c(3,4,2))
> DT <- as.data.table(AR)
> print(DT)
    V1 V2 V3 value
 1:  1  1  1     1
 2:  1  1  2    13
 3:  1  2  1     4
 4:  1  2  2    16
[...]
21:  3  3  1     9
22:  3  3  2    21
23:  3  4  1    12
24:  3  4  2    24
    V1 V2 V3 value
> 

So, I have written a function to flexibly convert either matrix or array to data.table in the same fashion. Maybe of help for others.

# Melt matrix or array to data.table 
array2dataTable <- function(x) {
  
  # if is matrix, add third dimension (as.data.table does not melt matrices)
  x.is.matrix <- FALSE
  if (length(dim(x))==2) {
    x.is.matrix <- TRUE
    cat("\nNote: x is a matrix, converting it to array with 3rd dim==3 ..")
    dim(x) <- c(dim(x), 1L)
  }
  # add dimnames
  if (is.null(dimnames(x))) {
    cat("\nNote: Array has no dimnames, using seq of integers ..\n")
    dimnames(x) <- lapply(dim(x), function(X) as.character(seq.int(1, X)))
  }
  DT <- as.data.table(x, na.rm = TRUE)
  if (x.is.matrix==TRUE) DT[,V3:=NULL] # remove third column if converting from 2D matrix
  print(str(DT))
  return(DT)
}

Happy to get feedback if you notice any issue with this. Thanks!

Sara
  • 465
  • 5
  • 15