I am going to describe my question with this minimal, reproducible example (reprex). I have two tab separated value files df1.map
and df2.map
. df1.map
looks like this:
1 BICF2G630707759 0 3014448
1 BICF2S2358127 0 3068620
1 BICF2P1173580 0 3079928
1 BICF2G630707846 0 3082514
1 BICF2G630707893 0 3176980
1 TIGRP2P175_RS8671583 0 3198886
1 BICF2P1383091 0 3212349
1 TIGRP2P259_RS8993730 0 3249189
1 BICF2P186608 0 3265742
1 BICF2G630707908 0 3273096
df2.map
is this:
1 BICF2P413765 0 85620576
1 BICF2P413768 0 85621395
1 BICF2P860004 0 85633349
1 BICF2G630707846 0 85660017
1 BICF2G630707893 0 85684560
1 CHR1_85660017 0 85660017
1 BICF2P91636 0 85684560
1 CHR1_85685260 0 85685260
1 BICF2P172347 0 85700399
1 BICF2P1125163 0 85707031
The second column in each file holds an ID information. The question is: Check for every ID in df1.map
if it is present in df2.map
. If it is present, take the information of that ID, of that row from the columns 1 and 4 from df2.map
and over-write the columns 1 and 4 in df1.map
with the information from df2.map
.
I tried one solution with python (after converting from TSV to CSV):
import pandas as pd
CF2 = pd.read_csv('df1.csv', low_memory=False)
CF3 = pd.read_csv('df2.csv', low_memory=False)
i = 0
while i < CF2.shape[0]:
if (CF2[CF2.columns[1]].values[i]) == (CF3[CF3.columns[1]]):
print (CF3[CF3.columns[0]])
i = i + 1
Yet, that resulted only in errors.
Then I tried to solve it in R:
CFTwo <- read.table("df1.map", sep="\t", header=FALSE)
CFThree <- read.table("df2.map", sep="\t", header=FALSE)
for (i in CFThree [,2]) {
if (i == CFTwo [,2]) {
print(CFTwo [,c(1,2,4)])}}
That gave 50 warnings and no result. Since I was already not able to get the right values printed, I am far away from over-writing and changing files. Besides Python and R I am also open to a solution in bash.