I have a text file (1 Billion lines) of 60GB size. I have to extract data corresponds to specified line numbers which can be read from another text file (eg:1, 4, 70, 100...etc). Due to the size I can't load data to memory and then extract lines. Also, line by line matching and extraction would take many days of time. Is there any solution exist for this problem?
2 methods which I tried:
1. first method
f = open('line_numbers.txt')
lines = f.readlines()
numbers =[int(e.strip()) for e in lines]
r = max(numbers)
file = open('OUTPUT_RESULT.txt','w')
with open('Large_File.txt') as infile:
for num, line in enumerate(infile,1):
if (num<= r):
if (num in numbers):
file.write(line)
else:
pass
print(num)
It will take many days to get the result
2. second method
import pandas as pd
data = pd.read_csv('Large_File.txt', header=None)
file = open('OUTPUT_RESULT.txt','w')
f = open('line_numbers.txt')
lines = f.readlines()
numbers =[int(e.strip()) for e in lines]
x = data.loc[numbers,:]
file.write(x)
It does not load file to memory
Is there any solution available to resolve this?