weird error with R when using data.table

Question

I'm doing some small calculations and i decided to fill the data inside a data.table since it's much faster than data.frame and rbind

so basically my code is something like that:

df is a data.frame used in the calculation but it's important what does it contain.

l=12000
dti = 1
dt = data.table(ni = 0, nj = 0, regerr = 0)
for (i in seq(1,12000,200)) {
    for (j in seq(1, 12000, 200)) {
        for (ind in 1:nrow(df)) {
            if( i+j >= l/2 ){
                df[ind,]$X =  df[ind,]$pos * 2
            } else {
                df[ind,]$X = df[ind,]$pos/l
            }
        }
        for (i in 1:100) { # 100 sample
            sample(df$X,nrow(df), replace=FALSE) 
            fit=lm(X ~ gx, df)   #linear regression calculation
            regerror=sum(residuals(fit)^2)

            print(paste(i,j,regerror))
            set(dt,dti,1L,as.double(i))             
            set(dt,dti,2L,as.double(j))             
            set(dt,dti,3L,regerror)             
            dti=dti+1

        }
     }
 }

The code prints the first few rounds of print(paste(i,j,regerror)) and then it quits with this error:

 *** caught segfault ***
address 0x3ff00008, cause 'memory not mapped'
Segmentation fault (core dumped)

EDIT

structure(list(ax = c(-0.0242214, 0.19770304, 0.01587302, -0.0374415, 
0.05079826, 0.12209738), gx = c(-0.3913043, -0.0242214, -0.4259067, 
-0.725, -0.0374415, 0.01587302), pos = c(11222, 13564, 16532, 
12543, 12534, 14354)), .Names = c("ax", "gx", "pos"), row.names = c(NA, 
-6L), class = "data.frame")

Any ideas are appreciated.

@Arun the two i index where an error by me when writing the question, but basically the function calculate something inside this loop (where it's written i calculate something), then i shuffle it, apply a regression on the shuffled data then i save inside the data.table — ifreak, Feb 08 '13 at 14:41
If you want to supply [reproducible code](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) I think you may get an answer. I'm sure that three for loops is not the most efficient technique for what you're doing. As it stands, I can't quite follow what is going on in there without knowing what `df` is and knowing at least a little bit about what happens in your innermost for loop. — Justin, Feb 08 '13 at 15:06
@Justin i've updated my question with the df dataframe and the code inside the inner most loop.. — ifreak, Feb 08 '13 at 15:21

Justin · Answer 1 · 2013-02-08T15:58:54.773

Without meaning to sound rude, I think you may benefit from reading a few R tutorials before going forward. This question is also very likely to be closed as too localized. Also, seg faults are almost always a bug somewhere, but you can avoid a bunch of this headache by understanding what each piece of your code is doing. Since its Friday, lets walk through some of it:

if( i+j >= l/2 ){
   data[ind,]$X =  df[ind,]$pos * 2
}
else{
   data[ind,]$X = df[ind,]$pos/l
}

I'll assume data is meant to be df and go from there. We're inside two loops of i and j that both go from 1 through 20000. They will never sum to less than 1/2 so you will always execute the first statement. Also, if you ever expected the FALSE case to occur, you would need else on the same line as your closing brace:

if (i + j >= 1/2) {
   df$X <- df$pos * 2
} else {
   df$X <- df$pos
}

R is vectorized so doing the above is the same as looping through every value and multiplying by 2. I also removed the / 1 statement since it doesn't do anything. This whole section can be moved outside of the loop. Since its a constant operation of adding a column X that is double the column pos.

Next, your loop where you do a fit:

for (i in 1:100) { # 100 sample
   sample(df$X,nrow(df), replace=FALSE) 
   fit=lm(X ~ gx, df)   #linear regression calculation
   regerror=sum(residuals(fit)^2)

   print(paste(i,j,regerror))
   set(dt,dti,1L,as.double(i))             
   set(dt,dti,2L,as.double(j))             
   set(dt,dti,3L,regerror)             
   dti=dti+1
}

Taking, sample(df$X, nrow(df), replace=FALSE) will only show you the new order. It doesn't actual assign them. Instead df$X <- sample(df$X, nrow(df), replace=FALSE).

Now, It looks like you're going to assign into dt (which is a function much like df and should be avoided as a variable name) at row dti the result of this fit error as well as your indicies? As far as I can tell, nothing depends on i or j. Instead, you're going to perform a randomly ordered fit 60 * 60 * 100 times... If that is what you want to do, by all means go for it! But instead do it in an efficient way:

df$X <- df$pos * 2
fit.fun <- function(n, dat) {
   jumble <- sample(nrow(dat))
   dat$X <- dat$X[jumble]
   sum(residuals(lm(X ~ gx, dat))^2)
}

sapply(1:10, fit.fun, dat=df)

thanks for your reply, but first of all the 1 that you are refering to is not `1 its l`, secondly this just a test script, the actual idea that i want to implement is more complicated and you cannot use the vecotrized thing ..because it include other functions that will assign the ` X to df ` and concerning the sampling, yes what you mentioned is what i want to do ..but my problem is that the 60 *60*100 are not being filled inside the `data.table` instead i'm getting the error that i've copied — ifreak, Feb 08 '13 at 16:00
I suggest you break your problem into manageable chunks, specifically the portion that causes your error. Then work toward a minimal reproducible example that will still give the error and post that working example to a new question. Using the code you provided, there is no seg fault, but instead an error in the `sample` line. — Justin, Feb 08 '13 at 16:11

weird error with R when using data.table

1 Answers1