1

I am trying something pretty simple but i have issue I do not understand. Basically I have a file that is filled with text with this form

Text Accuracy: 0.568221 F1 = 22 recall=0.54
with paramters A=xxx B=11 C=222...
=============================
Text Accuracy: 0.568221 F1 = 22 recall=0.54
with paramters A=xxx B=11 C=222...
=============================
Text Accuracy: 0.568221 F1 = 22 recall=0.54
with paramters A=xxx B=11 C=222...
=============================
Text Accuracy: 0.568221 F1 = 22 recall=0.54
with paramters A=xxx B=11 C=222...
=============================
Text Accuracy: 0.568221 F1 = 22 recall=0.54
with paramters A=xxx B=11 C=222...
=============================

What I want to do is write all blocks of 3 lines that have an accuracy above 0.90 in another file. To navigate through the lines I used the solution proposed here. My code is the following :

with open('G:\Mayeul\Distribution images\Features_importance\LogDecisionTree.txt') as oldfile, open('G:\Mayeul\Distribution images\Features_importance\LogDecisionTree2.txt', 'w') as newfile:
     #print(len(oldfile.readlines()))
     for line in range(1,int(len(oldfile.readlines()))):
         print(line)
         if line%3==0:
             f=oldfile.readlines()[line-2]
             f=f.split(' ')[3]
             if int(f)>0.90:
                 newfile.write(oldfile.readlines()[line-2])
                 newfile.write(oldfile.readlines()[line-1])
                 newfile.write(oldfile.readlines()[line])

Starting here I have 2 issues I do not understand ... the first one is

f=oldfile.readlines()[line-2] IndexError: list index out of range

That i don't understand as I print the length that is 13599, and my modulo is working so 3-2=1, no negative number of lines

The second issue that I never got before is that when I uncomment the print(len(oldfile.readlines()))line, I have no errors as it prints the value, but then stop without doing nothing. It is like the print is killing the program as it does not enter the for loop....
Thx

Community
  • 1
  • 1
Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35

4 Answers4

2

readlines() advances the file pointer to the end of the file, so any further invocations will not yield anything unless the file has been enlarged in the meantime. Instead, read all the lines into memory (using the iterator protocol), and then index. In addition, your parsing is incorrect.

# Make sure to correctly escape backslashes!
old_fn = 'G:\\Mayeul\\Distribution images\\Features_importance\\LogDecisionTree.txt'
new_fn = 'G:\\Mayeul\\Distribution images\\Features_importance\\LogDecisionTree2.txt'

with open(old_fn) as oldfile:
     old_lines = list(oldfile)

with open(new_fn, 'w') as newfile:
     print(len(old_lines))
     for line in range(1, len(old_lines)):
         print(line)
         if line % 3 == 0:
             f = old_lines[line-2]
             accuracy = f.split(' ')[2]
             if float(accuracy) > 0.90:
                 newfile.write(old_lines[line-2])
                 newfile.write(old_lines[line-1])
                 newfile.write(old_lines[line])
phihag
  • 278,196
  • 72
  • 453
  • 469
2

You have no reason here to load the whole file in memory. If you later want to process huge files, it could cause useless resource problems. You only need to keep 3 lines:

with open('G:\Mayeul\Distribution images\Features_importance\LogDecisionTree.txt') as oldfile, open('G:\Mayeul\Distribution images\Features_importance\LogDecisionTree2.txt', 'w') as newfile:
     #print(len(oldfile.readlines()))
     oldlines = [None] * 3   # reserve a storage for 3 lines
     for linenum, line in enumerate(oldfile, 1):
         oldlines[linenum%3] = line    # actually store the line
         if linenum%3==0:
             f = oldlines[1]
             f=f.split(' ')[2]
             if float(f)>0.90:
                 newfile.write(oldlines[1])
                 newfile.write(oldlines[2])
                 newfile.write(oldlines[0])
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • This works (I corrected the float issue thx @PM 2Ring, I accepted this one as its design helps ressources saving – Mayeul sgc Mar 29 '17 at 09:01
0

You can't use oldfile.readlines() multiple times.

Instead of it, assign the content of the file into a variable as follows:

contentOfTheFile = oldfile.readlines()

and use this variable instead of oldfile.readlines() in the the code.

Václav Struhár
  • 1,739
  • 13
  • 21
  • Of course you can (for instance when reading logfiles). It's just that the semantics of using `oldfile.readlines()` multiple times don't match what @Mayeul sgc expects. – phihag Mar 29 '17 at 08:40
0

Try this.

f = open("Input file path")

l = f.read().split("=============================")

for each_line in l:
    if each_line.strip():
        print(each_line.split()[2])

f.close()
Technocrat
  • 99
  • 2
  • 11