Search Large file for text and write result to file

Question

I have file one that is 2.4 millions lines (256mb) and file two that is 32 thousand lines (1.5mb).

I need to go through file two line by line and print matching line in file one.

Pseudocode:

open file 1, read
open file 2, read
open results, write

for line2 in file 2:
    for line1 in file 1:
        if line2 in line1:
            write line1 to results
            stop inner loop

My Code:

p = open("file1.txt", "r")
d = open("file2.txt", "r")
o = open("results.txt", "w")

for hash1 in p:
    hash1 = hash1.strip('\n')
    for data in d:
        hash2 = data.split(',')[1].strip('\n')
        if hash1 in hash2:
            o.write(data)

o.close()
d.close()
p.close()

I am expecting 32k results.

Please Follow the code here : https://stackoverflow.com/a/15174569/6126313 — Syenix, May 17 '19 at 20:11
Have you tried just running your code? Also, if you're running linux, there's a command called "diff", that may help you. — Julio P.C., May 17 '19 at 20:32

Sergey Nudnov · Answer 1 · 2019-05-17T21:00:05.083

Your file2 is not too large, so it is perfectly well to load it in memory.

Load file2.txt into a set to speed up search process and remove duplicates;
Remove empty line from a set;
Scan file1.txt line-by-line and write found matches in results.txt.

with open("file2.txt","r") as f:
    lines = set(f.readlines())

lines.discard("\n")

with open("results.txt", "w") as o:
    with open("file1.txt","r") as f:
        for line in f:
            if line in lines:
                o.write(line)

If file2 was larger, we could have split it in chunks and repeat the same for every chunk, but in that case it would be harder to compile the results together

Search Large file for text and write result to file

1 Answers1