I have a huge text file (>16 GB size) where each line is of the form
- 22_0F3, 33_0F4, 0.87
- 28_0F3, 37_0F4, 0.79
- .................... . . .
- 21_0F2, 32_2F1, 0.86
I have to extract all lines from this huge text file that start with the strings specified in another file as
- 22_0F3, 33_0F4
- 32_0F1, 21_2F2
- .............. . .
The code below does this job but the problem is it takes much time to finish.
huge = open('huge.txt')
lines= open('lines.txt')
output = open('output','w')
X=[]
l=[]
for line1 in lines:
x1 = line1.split(',')[0].strip()
x2 = line1.split(',')[1].strip()
XX = [x1, x2]
X.append(XX)
for line3 in huge:
z1 = line3.split(',')[0].strip()
z2 = line3.split(',')[1].strip()
z3 = line3.split(',')[2].strip()
ZX = [z1, z2]
ZY = [z2, z1]
if ZX in X or ZY in X:
ZX.append(z3)
l.append(ZX)
print(ZX)
for i in l:
output.write(str(i)[1:-1]+'\n')
output.close()
Expected output:
1. 22_0F3, 33_0F4, 0.87
2. 32_2F1, 21_0F2, 0.86
I'm a beginner in python programming, can anybody help me with optimizing this code to get the result fast?
Is there any faster method to get the same output?