0

I am trying to use melt on a data.table,

mdoern_gt_melted <- data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
    Error in melt.data.table(gtData, c("chrom", "pos", "ref", "alt")) : negative length vectors are not allowed

my data table is like:

# modern_gt
 chrom      pos ref alt Nea HG01566 NA18593 NA19795 HG01105 HG03225
       1: chr20    10723   T   .  67      66      66      66      66      66
       2: chr20    10724   G   .  67      66      66      66      66      66
       3: chr20    10725   C   .  67      66      66      66      66      66
       4: chr20    10726   C   .  67      66      66      66      66      66
       5: chr20    10727   T   .  67      66      66      66      66      66

I have tried:

(1) use a subset of data

# its ok
data.table::melt(modern_gt[1:10000, ], id.vars = c('chrom', 'pos', 'ref', 'alt'))
# its not ok 
data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))

I have checked

# https://stackoverflow.com/questions/42479854/merge-error-negative-length-vectors-are-not-allowed
# not duplicated
modern_gt <- unique(modern_gt)

My data (modern_gt.rds) can be obtain from: https://mega.nz/file/SHhnQCYZ#I7dl625XKreIBc3TYn7nYc_L4TTPcsQFZEwnEwD3qu0

zhang
  • 185
  • 7
  • 3
    It seems you have 21324420 unique rows according to your id variables that is the same as the number of rows in your table but when melting the resulting table will have 21324420 x 100 rows (number of cells in your table) that counts to 2132442000 which is dangerously close to the maximum size of a vector (2^31 = 2147483648). With the overhead of data structure maybe you are hitting the ceiling. – Billy34 Jun 28 '23 at 07:18
  • Yeah, Its seems that right, I noticed that this issue doesn't actually always occur. In multiple attempts, occasionally it would run. By splitting the table into smaller parts, it can now run. – zhang Jun 28 '23 at 08:37

1 Answers1

1

You may experience memory issue.

modern_gt <- readRDS("/home/sapi/Downloads/modern_gt.rds")
data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
#> Error in melt.data.table(modern_gt, id.vars = c("chrom", "pos", "ref", : negative length vectors are not allowed

However when you sample it (in this case to 2 mln samples), it works:

modern_gt <- modern_gt |>
  dplyr::slice_sample(n = 2000000)

data.table::melt(modern_gt, id.vars = c('chrom', 'pos', 'ref', 'alt'))
#>            chrom      pos ref alt variable value
#>         1: chr20 14204394   T   .      Nea    76
#>         2: chr20 16182408   G   .      Nea    76
#>         3: chr20 19657430   A   .      Nea    77
#>         4: chr20 20949457   A   G      Nea    77
#>         5: chr20  9800784   A   .      Nea    77
#>        ---                                      
#> 201999996: chr20  9188501   C   A  NA19035    76
#> 201999997: chr20  1547591   T   .  NA19035    66
#> 201999998: chr20 19909239   T   .  NA19035    66
#> 201999999: chr20  1396424   A   .  NA19035    66
#> 202000000: chr20 20094721   C   G  NA19035    16

I would suggest to divide your observation to smaller chunks, transpose it that way and rbind together

EDIT:

That's what @Billy34 meanwhile suggested in comment.

Created on 2023-06-28 with reprex v2.0.2

Grzegorz Sapijaszko
  • 1,913
  • 1
  • 5
  • 12