I have two large csv files of thousands of entries, each file contains two columns of ID's of the form :
BRADI5G01462.1_1 NCRNA_34654_1853
BRADI5G01462.1_1 NCRNA_34398_1942
BRADI5G01462.1_1 NCRNA_2871_1959
I've tried this, but it's not giving the expected results:
import csv
files=["#Left(Brachypodium_Japonica).csv","#Right(Brachypodium_Japonica).csv"]
for i in range(len(files)):
name=files[i][files[i].find("#")+1:files[i].find(".")]
with open(files[i],"r",newline='') as source:
rdr= csv.reader( source,delimiter="\t",skipinitialspace=True )
with open("@"+name+".csv","w",newline='') as result:
wtr= csv.writer( result,delimiter="\t",skipinitialspace=True )
for r in rdr:
wtr.writerow( (r[1],r[2]) )
l1 = set(open('@Left(Brachypodium_Japonica).csv'))
l2 = set(open('@Right(Brachypodium_Japonica).csv'))
open('Intersection(Brachypodium_Japonica).csv', 'w').writelines(l1 & l2)
what is the most efficient pythonic way to find the intersection between both files ?! by which i.e. whole match of the two columns in both files .
I've asked this question before, but no one bothered to help.
I'm really stuck in this and desperately need help that would be highly appreciated.
Edit:
File 1 (Left) input sample:
BRADI5G16060.1_36 OS08T0547100-02_5715
BRADI3G00440.1_243 OS03T0274400-01_2650
BRADI3G58610.1_438 OS01T0112500-01_899
BRADI1G73670.1_850 OS11T0481500-01_6621
BRADI1G78150.1_870 OS02T0543300-00_2055
File 2 (Right) input sample:
BRADI5G16060.1_36 OS08T0547100-02_5715
BRADI4G45180.1_240 OS03T0103800-01_2473
BRADI2G12470.2_487 OS04T0470600-00_3504
BRADI1G73670.1_850 OS11T0481500-01_6621
BRADI1G78330.1_878 OS06T0155600-01_4411
Intersection file of Left & Right:
BRADI5G16060.1_36 OS08T0547100-02_5715
BRADI1G73670.1_850 OS11T0481500-01_6621