I'm writing a R code to count/select the rows with absolute difference of two values in two columns less than a certain value (say 0.1). It reads two files and two column numbers, based on which the calculation is done.
args<-commandArgs(TRUE)
f1<-args[1]
f2<-args[2]
col1<-args[3]
col2<-args[4]
if (length(args) <4) {
stop("\n\nUsage: file1 file2 column1 column2 >output\n\n")
}
pearsoncor<-function(in1,in2,x,y){
# Read in data
file1<-read.table(in1,header=F)
file2<-read.table(in2,header=F)
col1<-as.integer(x)
col2<-as.integer(y)
# Calculate pearson correlation
cor_r <- which(file1[,col1] != '.' & file2[,col2] != '.')
file1.filter<-file1[cor_r,col1]
file2.filter<-file2[cor_r,col2]
x=cor.test(c(file1.filter),c(file2.filter),method="pearson")
fname=paste(basename(in1), " and ", basename(in2),sep="")
print(fname)
print(x)
conc_c <- 0
for (i in 1:length(cor_r))
{
dif<-abs(file1.filter[i,] - file2.filter[i,])
if (dif <0.1){
conc_c = conc_c +1
}
}
print(conc_c)
}
pearsoncor(f1,f2,col1,col2)
Test files: two test files that are tab delimited.
==> t1.txt <==
chr1 10468 10470 . 1 + chr1 10468 10470 0.762 2
chr1 10470 10472 . 2 + chr1 10470 10472 0.738 2
chr1 10483 10485 . 3 + chr1 10483 10485 0.865 2
chr1 10488 10490 . 4 + chr1 10488 10490 0.825 2
chr1 10492 10494 . 5 + chr1 10492 10494 0.894 2
chr1 10496 10498 . 6 + chr1 10496 10498 0.859 2
chr1 10524 10526 . 7 + chr1 10524 10526 0.954 2
chr1 10541 10543 . 8 + chr1 10541 10543 0.876 2
chr1 10562 10564 . 9 + chr1 10562 10564 0.829 2
chr1 10570 10572 . 10 + chr1 10570 10572 . 2
==> t2.txt <==
chr1 10468 10470 . 1 + chr1 10468 10470 0.69 2
chr1 10470 10472 . 2 + chr1 10470 10472 0.7 2
chr1 10483 10485 . 3 + chr1 10483 10485 0.911 2
chr1 10488 10490 . 4 + chr1 10488 10490 0.894 2
chr1 10492 10494 . 5 + chr1 10492 10494 0.714 2
chr1 10496 10498 . 6 + chr1 10496 10498 0.857 2
chr1 10524 10526 . 7 + chr1 10524 10526 0.984 2
chr1 10541 10543 . 8 + chr1 10541 10543 0.955 2
chr1 10562 10564 . 9 + chr1 10562 10564 0.981 2
chr1 10570 10572 . 10 + chr1 10570 10572 0.979 2
To run:
Rscript script.r t1.txt t2.txt 10 10
I couldn't get the absolute value calculation done in the code.. it gave this error:
Error in `[.default`(file1.filter, i, ) : incorrect number of dimensions
Calls: pearsoncor.jgu -> [ -> [.factor -> NextMethod
Execution halted
Also the first line of the error message came from here:
> file1.filter[1,]
Error in `[.default`(file1.filter, 1, ) : incorrect number of dimensions
> dput(head(file1.filter))
c(0.762, 0.738, 0.865, 0.825, 0.894, 0.859)