4

I've come across something a bit wierd, especially because the code may give different outputs each time it's run. In a nutshell I was incorrectly using set to set a value in a row bigger than the last one but instead of doing nothing set created a negative length data.table.

library(data.table)

dt<-data.table(id=1:5, var=rnorm(5)) # normal example

set(dt, 6L, 1L, 3L) # doesn't set anything as expected.
dt
#
# now my real data, after I found the error in my code (incorrect row number in set)
#
dt1 <- data.table(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
                      startDate = structure(15062L, class = c("IDate", "Date")), 
                      endDate = structure(15429L, class = c("IDate", "Date")), 
                      start = "1750", end = "2404",
                      date = structure(15461L,class = c("IDate", "Date")),
                      DESCR = "JOB", NOTE = "NEW")

set(dt1, 12L, 3L, 62385.6516144086)
str(dt1)
Classes ‘data.table’ and 'data.frame':  1 obs. of  10 variables:
 $ ID       : chr "29502509"
 $ FY       : num 2012
 $ VAR      : num 61068
 $ startDate: IDate, format: "2011-03-29"
 $ endDate  :
Error in do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) : 
  negative length vectors are not allowed
> sapply(dt1, length)
        ID         FY        VAR  startDate    endDate      start        end       date 
         1          1          1          1 -637110831          1          1          1 
     DESCR       NOTE 
         1          1 
> dput(dt1)
structure(list(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
    startDate = structure(15062L, class = c("IDate", "Date")), 
    endDate = structure(, class = c("IDate", "Date")), start = "1750", # HERE
    end = "2404", date = structure(15461L, class = c("IDate", 
    "Date")), DESCR = "JOB", NOTE = "NEW"), .Names = c("ID", 
"FY", "VAR", "startDate", "endDate", "start", "end", "date", 
"DESCR", "NOTE"), row.names = c(NA, -1L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)

As I said above you may need to run some times the entire code to see that, from the creation of the data.table dt1 <- data.table(... to set(dt1,..., because I noticed that if it doesn't happen the first time it won't ever happen unless I re-run dt1 <- data.table(... . Any idea?

EDIT:

To be specific, when I say different result I mean that sometimes it does nothing (as expected) but most of the times it creates a negative length column always the Date, and sometimes it creates an entire data.table with negative rows. Plus, in the last two cases (single column or entire data.table) the negative length is always -637110831

eddi
  • 49,088
  • 6
  • 104
  • 155
Michele
  • 8,563
  • 6
  • 45
  • 72
  • @Vivi thanks very much for corrections, I was very tired. I posted the question yesterday night after a 15 hour session :-( – Michele Jun 02 '13 at 10:17
  • 1
    I've linked your question on `data.table` forum: http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-June/001840.html – Arun Jun 02 '13 at 11:44
  • 1
    @Arun you are always more than useful, even when you don't have the answer! Thanks! – Michele Jun 02 '13 at 11:48

1 Answers1

3

Looks like memory corruption due to writing beyond the memory allocated for the column.

This calls to assign in assign.c. From version 1.8.8, assign.c:434:

434             default :
435                 for (r=0; r<targetlen; r++)
436                     memcpy((char *)DATAPTR(targetcol) + (INTEGER(rows)[r]-1)*size, 
437                            (char *)DATAPTR(RHS) + (r%vlen) * size,
438                            size);

This code is reached (which should not be the case). At this point:

(gdb) p INTEGER(rows)[0]
$21 = 12
(gdb) p size
$23 = 8
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112