0

Whenever I am trying to melt data. I am getting this error. Pls let me know how to fix this error.

dd1<-melt.data.table(abc_comments,id.vars = c('user_id','date','source','value','value_clean','ids'))

#Error

Error: cannot allocate vector of size 2.9 Gb
zx8754
  • 52,746
  • 12
  • 114
  • 209
Deepak
  • 3
  • 1
  • This means R can't get enough memory from your computer. How big is your data? How many rows and columns? Without some basic idea of what your data is like we cant help. – Spacedman Feb 09 '22 at 08:23
  • CSV file which I am trying to melt is around 1.5 gb – Deepak Feb 09 '22 at 09:35
  • CSV size is not a good measure of the number of rows and columns in the data frame. – Spacedman Feb 09 '22 at 10:13
  • This is the number of rows and column ..... Number of rows :-2190943 Number of column :-182 – Deepak Feb 09 '22 at 10:23
  • I reopened this (from being a dupe of https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb) not because that dupe link was not applicable, but because that's the symptom, and I think my answer provides a way to resolve it. Perhaps the question should instead be: *"How to melt really large data"*, not focusing on the allocation error itself. – r2evans Feb 09 '22 at 15:58

1 Answers1

0

I'm un-duping this because while the symptom is a duplicate, there is a method not discussed (in detail) there that can resolve the issue.

Split, melt, recombine

I'll demonstrate using iris.

library(data.table)
irisDT <- as.data.table(iris)

# this is our "control"
melt1 <- melt(irisDT, id.vars = "Species")
head(melt1, 3)
#    Species     variable value
#     <fctr>       <fctr> <num>
# 1:  setosa Sepal.Length   4.3
# 2:  setosa Sepal.Length   4.4
# 3:  setosa Sepal.Length   4.4

The split-melt-combine solution:

melt2 <- rbindlist(lapply(split(irisDT, seq_len(nrow(irisDT)) %/% 51), melt, id.vars = "Species"))

Verification:

setorder(melt1, Species, variable, value)
setorder(melt2, Species, variable, value)
identical(melt1, melt2)
# [1] TRUE

Breakdown:

  1. Split. split(irisDT, 1:150 %/% 51) breaks it down into frames of up to 51 rows each. (I chose 51 arbitrarily.) In this case, it is a 3-long list of around 48-51 rows each. They don't need to be equal, just each small enough.

  2. Melt. melt each one individually using lapply(., melt, id.vars="Species"). This produces (again) a 3-long list, now of melted data.

  3. Combine. rbindlist speaks for itself.


Without exhaustive testing (I'm not in the practice of trying to fill memory, and I have 64GB so it'll take a lot more to do so), I wonder if running this as-is will not work directly. If that is the case, and you cannot find a size that makes it work, consider this alternative:

out <- list()
irisDTspl <- split(irisDT, seq_len(nrow(irisDT)) %/% 51)
out <- list()
for (i in seq_along(irisDTspl)) {
  out[[i]] <- melt(irisDTspl[[i]], id.vars = "Species")
  irisDTspl[i] <- NA      # anything small, but need to keep the index filled
  gc()
}
melt3 <- rbindlist(out)
setorder(melt3, Species, variable, value)
identical(melt1, melt3)
# [1] TRUE
r2evans
  • 141,215
  • 6
  • 77
  • 149