0

I'm trying to format a tab delimited txt file that has rows and columns. I'm trying to simply ignore the rows that have any empty values in it when I write to the output file. I'm doing this by len(list) method where if the length of the list equals the number of columns, then that line gets written to output file. But when I check the length of the lines, they are all the same, even though I removed the empty strings! Very frustrating...

Here's my code:

    import sys, os

    inputFileName = sys.argv[1]
    outputFileName = os.path.splitext(inputFileName)[0]+"_edited.txt"

    try:
        infile = open(inputFileName,'r')
        outfile = open(outputFileName, 'w')
        line = infile.readline()
        outfile.write(line)
        for line in infile:
        lineList = line.split('\t')
        #print lineList
        if '' in lineList:
              lineList.remove('')
        #if len(lineList) < 9:
              #print len(lineList)

              #outfile.write(line)
        infile.close()
        #outfile.close()
    except IOError:
        print inputFileName, "does not exist."

Thanks for any help. When I create an experimental list in the interactive window and use the if '' in list: then it removes it. When I run the code, the ' ' is still there!

Brock Adams
  • 90,639
  • 22
  • 233
  • 295
Lin
  • 21
  • Do not make Whitespace edits, for the OP, for whitespace-critical languages like Python! These change the question and can mask the problem. – Brock Adams Nov 12 '11 at 06:33

3 Answers3

1

I think that one of your problems is that list.remove only removes the first occurrence of the element. There could still be more empty strings in your list. From the documentation:

Remove the first item from the list whose value is x. It is an error if there is no such item.

To remove all the empty strings from your list you could use a list comprehension instead.

lineList = [x for x in lineList if x]

or filter with the identity function (by passing None as the first argument):

lineList = filter(None, lineList)
Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    Or just `if x` because the only `False` string is an empty string. – agf Oct 13 '11 at 19:45
  • Awesome! That did it. One more question if you don't mind. If I also want to ignore negative values, is there anyway to do this with a wild character that looks for "-" in the strings? I would rather not convert the list to floats if I can get away with it. – Lin Oct 13 '11 at 20:01
  • 1
    @Lin: Sure... I don't mind more questions! Just press the "Ask Question" button again and type your question in there. Then I (or someone else) will be able to answer it. – Mark Byers Oct 13 '11 at 20:08
1

I dont know any python but i can mention you dont seem to be checking for whitespace characters. What about \r, \n on top of the \t's. Why dont you try trimming the line and checking if its == ''

0

The following does what you're asking with fewer lines of code and removes empty lines of any kind of whitespace thanks to the strip() call.

#!/usr/bin/env python

import sys, os

inputFileName = sys.argv[1]
outputFileName = os.path.splitext(inputFileName)[0]+"_edited.txt"

try:
    infile = open(inputFileName,'r')
    outfile = open(outputFileName, 'w')

    for line in infile.readlines():
        if line.strip():
            outfile.write(line)

    infile.close()
    outfile.close()
except IOError:
    print inputFileName, "does not exist."

EDIT: For clarity, this reads each line of the input file then strips the line of leading and trailing whitespace (tabs, spaces, etc.) and writes the non-empty lines to the output file.

kiswa
  • 14,737
  • 1
  • 21
  • 29