I have two files (Examples: A.txt and B.txt), where "A.txt" is very large. I would like to avoid reading the full file into memory, and read it line by line before merging those matching from "B.txt". Both files have headers as well.
My current code looks like this:
import os
import pandas as pd
contigs=pd.read_csv("A.txt", header=0, sep="\t")
coverages=pd.read_csv("B.txt", header=0, sep="\t")
merged=pd.merge(contigs, coverages, on='contig')
merged.to_csv("merged_file.txt", sep="\t", index=False)
The code works, but as mentioned above I would like read "A.txt" line by line, instead of fully reading to memory, and merge with "B.txt", before writing it out.
Thanks a lot for your help!
(Updating original post with example files)
head A.txt
clusterID kegg_contig contig
Cluster_10700 Unassigned_ERR1801630_792963 ERR1801630_contig_792963
Cluster_10700 Unassigned_ERR1801633_537686 ERR1801633_contig_537686
Cluster_10700 Unassigned_ERR505054_53474 ERR505054_contig_53474
Cluster_10700 Unassigned_ERR505054_31574 ERR505054_contig_31574
head B.txt
contig coverage
ERR1726751_contig_1 28.82716
ERR1726751_contig_2 12.265934
ERR1726751_contig_3 17.733767