-1

when running this script on just one file in a folder:

emboss<-read.table("emboss_012.ss",header=T)
x<-table(emboss[,2],emboss[,3])/NROW(emboss[,3])
y<-as.vector(t(x))
nms <- expand.grid(colnames(x), rownames(x))
names(y) <- paste( nms[,2],nms[,1],sep="")
write.table(t(y), file = "test3.csv",append=TRUE)

I get the desired result

However doing this in one go for all files in the folder results in random NA's appearing. I am doing this by:

runForAll <- function(x) {
  emboss <- read.table(x,header=T)
  x <- table(emboss[,2],emboss[,3])/NROW(emboss[,3])
  y <- as.vector(t(x))
  nms <- expand.grid(colnames(x), rownames(x))
  names(y) <- paste( nms[,2],nms[,1],sep="")
  return(t(y))
}

my.files <- list.files(pattern = "emboss_\\d+\\.ss")
outputs <- lapply(my.files, FUN = runForAll)   

library(plyr)
one.header.output <- rbind.fill.matrix(outputs)
write.table(one.header.output, file = "nontpsec.csv")

and my files are located here:

https://drive.google.com/folderview?id=0B0iDswLYaZ0zWjQ4RjdnMEUzUW8&usp=sharing

this is very weird and can't why it is happening, especially as all the other data is correct, even when looping through all files in one go.

brucezepplin
  • 9,202
  • 26
  • 76
  • 129
  • interesting that calling as.matrix(outputs) shows the length decrease for the offending files. even though when you eyeball each output it shows no missing data. – brucezepplin Jun 12 '13 at 15:17
  • 1
    In my experience, I achieve best results when I try to construct a minimal, self contained example. The problem becomes evident in 95% of cases (for me). Consider posting such an example (see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the future. – Roman Luštrik Jun 12 '13 at 16:04

1 Answers1

2

Your data tables are different lengths, e.g. the first one has 20 rows the last one only 19! This is where the problem comes from.

Here's a little test:

tmp <- c("A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y")

which(rownames(x) %in% tmp)

In the case of files 12 and 13 the second row is missing (label B).

Have a look at this post:

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

This might work for you:

Fastest way to add rows for missing values in a data.frame?

Community
  • 1
  • 1
Joanne Demmler
  • 1,406
  • 11
  • 31