I have two tab separated files of 1708 rows and different number of columns. My goal is to compare the value stored for all rows but only some specific columns. I have two lists containing the columns' number that I want to compare; here an example:
- FileA ➝ col_ind_A = [12,20,24,55]
- FileB ➝ col_ind_B = [14,28,35,79]
Here, column 12 of file A should be compared with column 14 of file B, 20 of fileA with 28 of fileB and so on. If file A has value 0 and file B doesn't, I want to modify file C (a copy of file A) in that position, and then store the value of file B (which is not 0):
# FileA #FileB #FileC
col11 col12 col13 col13 col14 col15 col11 col12 col13
A C G A C G A C G
G 0 T G T T G T T
I've seen that comparing columns is usually done with awk, but I'm quite new to bash and I don't know how to iterate over the rows of the two files while I iterate over the col_ind lists and indicate the column positions that I want to compare. Any suggestions are be welcome.
If it's of any help, I show an R code that does exactly this (it is just too slow):
for(i in 1:1708){ #rows
for(j in 1:31946){ #cols
if( fileA[i, col_ind_A[j]] == '0' && fileA[i, col_ind_A[j]] != fileB[ i, col_ind_B[j]]){
fileC[i, col_ind_A[j]] <- fileB[i, col_ind_B[j]] # write value from fileB in file C
}
}
}
Any help would be great. Thanks!!