Iterating tab delimited file fails (loop runs only once)

Question

I'm very new to python and having trouble with this bit of code that I wrote:

#! /usr/bin/env python


OutFileName = "k_values.txt"

OutFile = open(OutFileName, 'w')
with open("structure_working.txt") as infile:
    for Line in open("structure_working.txt"):
        Line = Line.strip('\n')
        ElementList = Line.split('\t')
        k1 = ElementList[2]
        k2 = ElementList[3]
        k3 = ElementList[4]
        k4 = ElementList[5]
        k = 1
         if k2 > k:
            k = 2
         if k3 > k2:
            k = 3
         if k4 > k3:
            k = 4
        name = ElementList[1]
        OutputString = "%s\t%s" % (name, k)
        print OutputString
        OutFile.write(OutputString + "\n")

OutFile.close()

My input file is a tab delimited file. The problem is that my for loop only runs once over the header line and doesn't continue into the body of the file. Here is an example of my text file:

     num    indiv   1   2   3   4   k   
     1  JB1972  0.642   0.141   0.091   0.127       
     2  JB1973  0.754   0.113   0.079   0.055       
     3  JB1974  0.069   0.422   0.418   0.091       
     4  JB1976  0.175   0.339   0.249   0.237       
     5  JB1977  0.149   0.365   0.383   0.103       
     6  JB1978  0.421   0.184   0.146   0.249       
     7  JB1979  0.264   0.246   0.395   0.095       
     8  JB1980  0.074   0.511   0.287   0.128       
     9  JB1981  0.083   0.162   0.275   0.48        
    10  JB1982  0.059   0.145   0.73    0.067

None of the answers I've found to the problem "for loop only runs once" were helpful to my specific problem. The fact that the code works on the header line makes me think that the problem is with the for loop. Any ideas?

If you have a comma/space/tab delimited file, you should use csv and your code doesnt make any sense, what are you trying to do? — user1767754, Nov 03 '15 at 07:38
A few problems here: You're opening you're file `structure_working.txt` twice: once as `infile` in the context manager, and then again in your for loop. Also, you shouldn't use uppercase letters in variable names. The comparisons against `k` will not work as expected, because you're comparing strings and ints. — karlson, Nov 03 '15 at 07:40
you have an indentation problem `if k2 > k:` all such `if`s are over-indented by one space — Pynchia, Nov 03 '15 at 07:41
I'm trying to assign a k value from a STRUCTURE run to each individual. The highest number in each line gets assigned as the k-value. — Lauren Konrade, Nov 03 '15 at 07:45
You cannot reasonably compare `k` (an int) to `k1` (a string). anyway, that's not the cause of your original problem. But maybe you could come up with a cleaner example of what's not working — karlson, Nov 03 '15 at 07:47
The indentation problems make me think this isn't the actual code. — Peter Wood, Nov 03 '15 at 08:18
Someone should have asked if there's an error message popping up? — moooeeeep, Nov 03 '15 at 08:51

score 1 · Answer 1 · edited May 23 '17 at 12:23

You should change your code to something like the following (untested) example:

import csv

# use with statement for file handling
# note that nesting is also possible and kind of convenient
with open(outfname, 'w+') as outf, open(infname) as inf:
    # use csv.reader and csv.writer to specify csv file format
    # also have a look at csv.DictReader (I like it much better...)
    reader = csv.reader(inf, delimiter='\t')
    writer = csv.writer(outf, delimiter='\t')
    # skip the header line (not necessary when csv.DictReader is used)
    reader.next()
    # iterate input lines
    for line in reader:
        # split line without having to deal with the proper formatting
        name = line[1]
        # somehow compute k value (probably I haven't got it right)
        k = max(enumerate(map(float, line[2:6])), 1), key=lambda x:x[1])[0]
        # write row without having to deal with proper formatting
        writer.writerow([name, k])

Use the with-statement for handling files whenever possible. They provide easy to use wrapper for fail-safe automatic closing and a clean syntax (in my opinion). Note that it's also possible and convenient to nest multiple files within one with-block.

You are reading a csv file (character separated values). Python provides a module for that in its standard library (you should use the standard library whenever possible). Also take into account frameworks like pandas they also provide functions for convenient handling of csv (or excel) files.

When you have properly dealt with the problem of reading and writing csv files you can finally go and revisit your logic of computing the k value. As you can see in my example, I have converted the strings from the input file to numbers before comparing them. Further I have assumed that you want to get the column number of the maximum value, so I have tried to implement that. When you try to implement something like that, again, please have a look at what functionality the standard library offers (beyond if-statements). This will greatly improve your life (and code quality). I have put some links to the documentation below for a start.

For reference:

Pynchia · Accepted Answer · 2015-11-03T08:17:31.353

The code is repeating the opening of the input file twice, unnecessarily

with open("structure_working.txt") as infile:
    for Line in open("structure_working.txt"):

Replace such lines with

with open("structure_working.txt") as infile:
    for Line in infile

Problem #1: all the ifs are over-indented by one space.

Problem #2: you aren't skipping the header

Here is the code, a bit tidier

outfileName = "k_values.txt"
with open("structure_working.txt") as infile, open(outfileName, 'w') as outfile:
    next(infile) # skip the header
    for line in infile:
        line = line.strip('\n')
        elementlist = line.split()
        k = max(enumerate(elementlist[2:], start=1), key=lambda t: t[1])[0]
        name = elementlist[1]
        outputstring = "{}\t{}".format(name, k)
        print outputstring
        outfile.write(outputstring + "\n")

With an input file of

num    indiv   1   2   3   4   k
1  JB1972  0.642   0.141   0.091   0.127
2  JB1973  0.754   0.113   0.079   0.055
3  JB1974  0.069   0.422   0.418   0.091
4  JB1976  0.175   0.339   0.249   0.237
5  JB1977  0.149   0.365   0.383   0.103
6  JB1978  0.421   0.184   0.146   0.249
7  JB1979  0.264   0.246   0.395   0.095
8  JB1980  0.074   0.511   0.287   0.128
9  JB1981  0.083   0.162   0.275   0.48
10  JB1982  0.059   0.145   0.73    0.067

it produces

JB1972  1
JB1973  1
JB1974  2
JB1976  2
JB1977  3
JB1978  1
JB1979  3
JB1980  2
JB1981  4
JB1982  3

i.e. for each line it indicates the column with the highest value

By the way, please respect the PEP-8 guidelines on variable naming.

I just corrected it per your suggestion, and still the same result. Thanks, though. — Lauren Konrade, Nov 03 '15 at 07:35

score 0 · Answer 3 · answered Nov 03 '15 at 07:39

0

Change your code to this

#! /usr/bin/env python


OutFileName = "k_values.txt"

OutFile = open(OutFileName, 'w')
with open("structure_working.txt") as infile:
    for Line in infile:
# the rest seems ok

answered Nov 03 '15 at 07:39

Serjik

10,543
8
61
70

true, but that doesn't solve the problem at all. There are a few more things to fix – Pynchia Nov 03 '15 at 08:08

Iterating tab delimited file fails (loop runs only once)

3 Answers3