python skipping a for loop

Question

I'm writing code to find latitudes and longitudes, and calculate a distance within a certain radius of a point, and separate the two files.

For the first 5 iterations, the program runs fine, but after that, the program does not run through the inner for loop. I have stepped through the code, it just steps over the for loop. It seems to be dependent on what I set variable radius to. If radius is smaller, it will allow fewer iterations of the inner for loop.

I'm afraid this might be a problem of how I'm reading in the file. I believe that after the 5th iteration the infile_2 is blank, but I can't figure out how to fix it.

def main(): 

    global infile_1, infile_2

    ## import geocoded college dataset
    infile_2 = open("university_FIPS.csv", "r")

    ## import great_lakes_sample
    infile_1 = open("great_lakes_sample.csv", "r")
    outfile_within_bound = open("great_lakes_blood_college", "w")
    outfile_outside_bound = open("great_lakes_blood_NOcollege", "w")
    inside_buffer_count = 0
    outside_buffer_count = 0

    global lat_1_index, long_1_index, lat_2_index, long_2_index

    ## set radius to desired length (in miles)
    radius = 100



    ## when it is generalized, use this:
    ##    radius = input_buffer_radius()


    # create two subsets of blood drive data, one within
    # radius of college, one outside

    # skip header
    n_1 = 0
    for infile_1_line in infile_1:
        infile_1_line = infile_1_line.strip().replace("\"", "").split(","),
        infile_1_line = infile_1_line[0]        

        record_stored = False

        # find index of lat_2, long_2
        if( n_1 == 0 ):
            lat_2_index = infile_1_line.index( "lat" )
            long_2_index = infile_1_line.index( "long" )
            infile_1_header_list = infile_1_line

        # assign lat_2, long_2 latitude and longitude values
        lat_2 = infile_1_line[ lat_2_index ]
        long_2 = infile_1_line[ long_2_index ]

        # skip header
        if n_1 > 0:
            print( "\n\nExamining Record:", n_1 )

            try:
                lat_2 = float( lat_2 )
                long_2 = float( long_2 )
            except ValueError:
                print( "Value error, skipping record" )
                continue                        
            except TypeError:
                print("Type error, skipping record" )
                continue
            print( "Coordinates for record:", lat_2, long_2)


            # skip header
            n_2 = 0


            # WILL NOT ENTER LOOP ON THIRD ITERATION, it's dependent on radius, when radius is 100, it stops at n_1 = 6
            if ( n_1 > 0):
                print("\n\n\nbefore loop")            

            for infile_2_line in infile_2:
                infile_2_line = infile_2_line.strip().split(",")                                                        

                if ( n_2 == 0):
                    print( "in" )


                # find index of lat_1, long_1, create header list
                if( n_2 == 0 and n_1 == 1):
                    lat_1_index = infile_2_line.index("lat")
                    long_1_index = infile_2_line.index("long")
                    infile_2_header_list = infile_2_line

                    # creat headers for each outfile
                    write_to_csv( infile_1_header_list, outfile_within_bound )
                    write_to_csv( infile_1_header_list, outfile_outside_bound )


                # assign values for lat_1, long_1, after header
                if( n_2 > 0 ):
                    lat_1 = infile_2_line[ lat_1_index ]
                    long_1 = infile_2_line[ long_1_index ] 

                    try:
                        lat_1 = float( lat_1 )
                        long_1 = float( long_1 )
                        value_error = False
                    except ValueError:
                        continue
                    except TypeError:
                        continue

                    dist = haversine_distance(lat_1, long_1, lat_2, long_2)

                    if( dist <= radius ):
                        print( "\nRecord", n_1, "is",
                               dist, "miles from", lat_1, long_1)
                        write_to_csv( infile_1_line, outfile_within_bound )
                        record_stored = True
                        print( "Record stored in outfile_inside_bound." )
                        print( "Moving to next record." )
                        inside_buffer_count += 1
                        break

                n_2 += 1

            if( record_stored == False):

                print( "\nOutside buffer." )
                write_to_csv( infile_1_line, outfile_outside_bound )
                outside_buffer_count += 1
                print( "Record stored in outfile_outside_bound." )


        n_1 += 1

    print("\nRecords within buffer:", inside_buffer_count, "\nRecords outside buffer:", outside_buffer_count)
    infile_1.close()
    infile_1.close()
    outfile_within_bound.close()
    outfile_outside_bound.close()

score 4 · Answer 1 · edited May 23 '17 at 12:21

The direct answer is when you iterate through the a file in a for x in f style loop, python is actually keeping track of how far into the file you go. So if you do 10 iterations of the inner for loop before reaching the break point, the next time you try to iterate through the file using infile_2 you will be starting 10 lines into the file!

It sounds like in your case, by the third iteration you have read the entire file, so the infile_2 iterator will just be sitting at the end of the file on all subsequent iterations of the outer for loop. The easy fix is to do infile_2.seek(0) before the inner for loop runs. This will reposition infile_2 to look at the beginning of the file again. Whew...

That is all fine and dandy, but I would like to suggest a couple of things to you:

When you open files use with open("test.txt","r") as f as seen in this SO post. This gives you the benefit of not having to remember to close the file explicitly as it is closed implicitly at the end of the block.
Often it is a better idea to read the file into a list, do your computations, then write the results all in one shot. This makes your code more organized (also easier to read), and also lets you avoid errors like the one you are running into.

To illustrate these strategies here is how I would read the files in your code example:

def main(): 
    global infile_1, infile_2

    with open("great_lakes_sample.csv", "r") as infile_1:
        #List comprehension to format all of the lines correctly
        infile1_lines = [line.strip().replace("\"", "").split(",") for line in infile_1] 

    with open("university_FIPS.csv", "r") as infile_2:
        #List comprehension to format all of the lines correctly
        infile2_lines = [line.strip().split(",") for line in infile_2]

    #Both files are automatically closed when their respected with blocks end.

python skipping a for loop

1 Answers1