0

I have difficulties to understand why my for-loop is skipped when the code goes through the while-loop for the second time. For the first round everything works very good.

k = 0
with open("Result.txt", "r") as ParseOutput:
    while k < num_lines:                                # num lines in test file = 12
        print("k: " + str(k))
        for line in islice(ParseOutput, k, k+100000):
            print("k in islice: " + str(k))
            field_list = []
            fields = line.strip("\n")
            fields = fields.split("\t")
            field_list.append(fields)
            query_plasmid = field_list[0][0]
            print("Query Plasmid: " + str(query_plasmid))

        l = k

        for m, line in enumerate(islice(ParseOutput, l, l+100000)):
            if l < 100000:
                print("if l: " + str(m))
                field_list = []
                fields = line.strip("\n")
                fields = fields.split("\t")
                field_list.append(fields)

                next_plasmid = field_list[0][0]
                print("Next Plasmid: " + str(next_plasmid) + "  l: " + str(l))

                if not str(query_plasmid) == str(next_plasmid):
                    query_plasmid_index = UniqueID_List.index(query_plasmid)
                    Start_line_list[query_plasmid_index] = k
                    End_line_list[query_plasmid_index] = m
                    New_start_line = m+1
                    print("Start line: " + str(k))
                    print("End line: " + str(m))
                    print("New_start_line: " + str(New_start_line))

                    l = 999999
                    print("l: " + str(l))

        k = k+1

The print commands are only for my control. Here is the output of the script:

k: 0
k in islice: 0
Query Plasmid: DA_000001
if l: 0
Next Plasmid: DA_000001 l: 0
if l: 1
Next Plasmid: DA_000001 l: 0
if l: 2
Next Plasmid: DA_000001 l: 0
if l: 3
Next Plasmid: DA_000002 l: 0
Start line: 0
End line: 3
New_start_line: 4
l: 999999
k: 1
k: 2
k: 3
k: 4
k: 5
k: 6
k: 7
k: 8
k: 9
k: 10
k: 11

As the title says, I don't understand why for k>0 the for-loop is skipped. I appreciate your help or any other ideas!

Best, Philipp

PS: I would prefer to have a solution with lists but unfortunately my file is so huge that I cannot even enumerate the lines without running into memory errors. Therefore the workaround with the hard coded numbers.

Philipp
  • 15
  • 7
  • why k = k+1 is indented outside of the while loop ? – farghal Mar 02 '17 at 15:43
  • Thanks farghal, but that wasn't the problem - it was just not formatted correctly on stackoverflow. – Philipp Mar 02 '17 at 15:56
  • 1
    Can you use the file as an iterator to read it line-by-line which should work even if it's really big? http://stackoverflow.com/questions/6475328/read-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory out of curiosity, how big is the actual file? – Roadmaster Mar 02 '17 at 16:00
  • It looks like somehow your script isn't recognizing there is more than 1 line in your file. As if it's reaching the end of your file in one pass, then just iterating a new value for `k` without actually doing any parsing on your file. – UnseenSpecter Mar 02 '17 at 16:04

1 Answers1

3

I think you read your whole file from the first loop you are doing. Because you reach the EOF of the filedescriptor 'ParseOutput', both your for loop don't have any line to read.

At the end of your loop (or beginning if you want), you need to add this:

ParseOutput.seek(0)

This will make your filedescriptor point to the beginning of your file.

lelabo_m
  • 509
  • 8
  • 21