0

I have a R script that contains a function, which I recieved in an answer for this question: R: For loop nested in for loop.

The script has been working fine on the first part of my data set, but I am now trying to use it on another part, which as far as I can tell, has the exact same format as the first, but for some reason I get an error when trying to use the script. I cannot figure out, what causes the error.

This is the script I am using:

require(data.table)

MappingTable_Calibrated = read.csv2(file.choose(), header=TRUE)
head(MappingTable_Calibrated)

#The data is sorted primarily after Scaffold number in ascending order, and secondarily after Cal_Startgen in ascending order.

MappingTable_Calibratedord = MappingTable_Calibrated[order(MappingTable_Calibrated$Scaffold, MappingTable_Calibrated$Cal_Startgen),]
head(MappingTable_Calibratedord)

dt <- data.table(MappingTable_Calibratedord, key = "Scaffold,Cal_Startgen")
head(dt)

# The following function creates pairs of loci for each scaffold.
# The function is a modified version of a function found retrieved from http://www.stackoverflow.com

fn = function(dtIn, id){

# Creates the object dtHead containing as many lines as in dtIn minus the last line)

dtHead = head(dtIn, n = nrow(dtIn) - 1)     

# The names of dtHead are appended with _a. paste0 short for: paste(x, sep="")

setnames(dtHead, paste0(colnames(dtHead), "_a")) 

# Creates the object dtTail containing as many lines as in dtIn minus the first line)

dtTail = tail(dtIn, n = nrow(dtIn) - 1)     

# The names of dtTail are appended with _b.

setnames(dtTail, paste0(colnames(dtTail), "_b")) 

# dtHead and dtTail are combined. Scaffold is defined as id. The blank column "Pairwise_Distance is added to the table.

cbind(dtHead, dtTail, Scaffold = id, Pairwise_Distance = 0) 

}

#The function is run on the data. .SDcols defines the rows to be included in the output.

output = dt[, fn(.SD, Scaffold), by = Scaffold, .SDcols = c("Name", "Startpos", "Endpos", "Rev", "Startgen", "Endgen", "Cal_Startgen", "Cal_Endgen", "Length")]
output = as.data.frame(output[, with = FALSE])

But when trying to create "output" I get the following error: Error in data.table(..., key = key(..1)) : Item 1 has no length. Provide at least one item (such as NA, NA_integer_etc) to be repeated to match the 2 rows in the longest column. Or, all columns can be 0 length, for insert()ing rows into.

dt looks like this:

Name          Length Startpos Endpos Scaffold Startgen Endgen Rev Match Cal_Startgen Cal_Endgen
1: Locus_7173    144        0    144       34   101196 101340   1     1       101196     101340
2: Locus_133     110        0    110       34   223659 223776   1     1       223659     223776
3: Locus_2746    161        0     89       65   101415 101504   1     1       101415     101576

A full dput of "dt" can be found here: https://www.dropbox.com/sh/3j4i04s2rg6b63h/AADkWG3OcsutTiSsyTl8L2Vda?dl=0

Community
  • 1
  • 1
Hjalte
  • 376
  • 5
  • 17

1 Answers1

5

Start with tracking the data which cause the error by:

function(dtIn, id){
    dtHead = head(dtIn, n = nrow(dtIn) - 1)     
    setnames(dtHead, paste0(colnames(dtHead), "_a")) 
    dtTail = tail(dtIn, n = nrow(dtIn) - 1)     
    setnames(dtTail, paste0(colnames(dtTail), "_b")) 
    r <- tryCatch(cbind(dtHead, dtTail, Scaffold = id, Pairwise_Distance = 0), error = function(e) NULL)
    if(is.null(r)) browser()
    r
}

Then you can see you are trying to cbind elements of different nrow/length:

Browse[1]> dtHead
Empty data.table (0 rows) of 9 cols: Name_a,Startpos_a,Endpos_a,Rev_a,Startgen_a,Endgen_a...
Browse[1]> dtTail
Empty data.table (0 rows) of 9 cols: Name_b,Startpos_b,Endpos_b,Rev_b,Startgen_b,Endgen_b...
Browse[1]> id
[1] 76
Browse[1]> 0
[1] 0

Which is not allowed.
I recommend to put an if(nrow( or something similar and then add columns id = integer(), Pairwise_Distance = numeric() for nrow = 0 cases.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • I am not completely sure, what the above shows me. What does the 76 id etc. actually tell me? I guess, what the ultimate qestion is, what in my actual data causes the problem? – Hjalte May 05 '15 at 07:48
  • the 76 values is not important. Important is it is non zero length value, and you are trying to `cbind` it to zero rows data.table. You cannot have data.table where some columns with have length=0 (all cols in your data.tables) while other will have length=1 (id variable). – jangorecki May 05 '15 at 08:24
  • Hello Hjalte, I am also getting the same error. Was it because of the data table or something else? – Abhishek Singh Apr 25 '17 at 09:10
  • @AbhishekSingh because of your code did not handle the case when one data.table has 0 columns while other has non-zero columns – jangorecki Jun 22 '17 at 13:47