melt function gives size allocation error

Question

Whenever I am trying to melt data. I am getting this error. Pls let me know how to fix this error.

dd1<-melt.data.table(abc_comments,id.vars = c('user_id','date','source','value','value_clean','ids'))

#Error

Error: cannot allocate vector of size 2.9 Gb

This means R can't get enough memory from your computer. How big is your data? How many rows and columns? Without some basic idea of what your data is like we cant help. — Spacedman, Feb 09 '22 at 08:23
CSV size is not a good measure of the number of rows and columns in the data frame. — Spacedman, Feb 09 '22 at 10:13
This is the number of rows and column ..... Number of rows :-2190943 Number of column :-182 — Deepak, Feb 09 '22 at 10:23
I reopened this (from being a dupe of https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb) not because that dupe link was not applicable, but because that's the symptom, and I think my answer provides a way to resolve it. Perhaps the question should instead be: *"How to melt really large data"*, not focusing on the allocation error itself. — r2evans, Feb 09 '22 at 15:58

score 0 · Answer 1 · answered Feb 09 '22 at 15:57

I'm un-duping this because while the symptom is a duplicate, there is a method not discussed (in detail) there that can resolve the issue.

Split, melt, recombine

I'll demonstrate using iris.

library(data.table)
irisDT <- as.data.table(iris)

# this is our "control"
melt1 <- melt(irisDT, id.vars = "Species")
head(melt1, 3)
#    Species     variable value
#     <fctr>       <fctr> <num>
# 1:  setosa Sepal.Length   4.3
# 2:  setosa Sepal.Length   4.4
# 3:  setosa Sepal.Length   4.4

The split-melt-combine solution:

melt2 <- rbindlist(lapply(split(irisDT, seq_len(nrow(irisDT)) %/% 51), melt, id.vars = "Species"))

Verification:

setorder(melt1, Species, variable, value)
setorder(melt2, Species, variable, value)
identical(melt1, melt2)
# [1] TRUE

Breakdown:

Split. split(irisDT, 1:150 %/% 51) breaks it down into frames of up to 51 rows each. (I chose 51 arbitrarily.) In this case, it is a 3-long list of around 48-51 rows each. They don't need to be equal, just each small enough.
Melt. melt each one individually using lapply(., melt, id.vars="Species"). This produces (again) a 3-long list, now of melted data.
Combine. rbindlist speaks for itself.

Without exhaustive testing (I'm not in the practice of trying to fill memory, and I have 64GB so it'll take a lot more to do so), I wonder if running this as-is will not work directly. If that is the case, and you cannot find a size that makes it work, consider this alternative:

out <- list()
irisDTspl <- split(irisDT, seq_len(nrow(irisDT)) %/% 51)
out <- list()
for (i in seq_along(irisDTspl)) {
  out[[i]] <- melt(irisDTspl[[i]], id.vars = "Species")
  irisDTspl[i] <- NA      # anything small, but need to keep the index filled
  gc()
}
melt3 <- rbindlist(out)
setorder(melt3, Species, variable, value)
identical(melt1, melt3)
# [1] TRUE

melt function gives size allocation error

1 Answers1

Split, melt, recombine