How to remove empty separators from read files in Python?

Question

Here is my input file sample (z.txt)

>qrst
ABCDE--  6  6 35 25 10
>qqqq
ABBDE--  7  7 28 29  2

I store the alpha and numeric in separate lists. Here is the output of numerics list #Output : ['', '6', '', '6', '35', '25', '10'] ['', '7', '', '7', '28', '29', '', '2']

The output has an extra space when there are single digits because of the way the file has been created. Is there anyway to get rid of the '' (empty spaces)?

You could just use `sq.split()` which treats consecutive whitespace as a single delimiter so you won't end up with blank strings to get rid of... — Jon Clements, Sep 13 '16 at 03:43

score 1 · Answer 1 · answered Sep 13 '16 at 03:20

1

You can take advantage of filter with None as function for that:

numbers = ['', '7', '', '7', '28', '29', '', '2']
numbers = filter(None, numbers)
print numbers

See it in action here: https://eval.in/640707

answered Sep 13 '16 at 03:20

sal

3,515
1
10
21

dawg · Answer 2 · 2016-09-13T04:16:11.853

1

If your input looks like this:

>>> li=[' 6  6  35  25  10', ' 7 7 28  29 2']

Just use .split() which will handle the repeated whitespace as a single delimiter:

>>> [e.split() for e in li]
[['6', '6', '35', '25', '10'], ['7', '7', '28', '29', '2']]

vs .split(" "):

>>> [e.split(" ") for e in li]
[['', '6', '', '6', '', '35', '', '25', '', '10'], ['', '7', '7', '28', '', '29', '2']]

edited Sep 13 '16 at 04:16

answered Sep 13 '16 at 04:08

dawg

98,345
23
131
206

score 0 · Answer 3 · answered Sep 13 '16 at 03:58

I guess there are many ways to do this. I prefer using regular expressions, although this might be slower if you have a large input file with tens of thousands of lines. For smaller files, it's okay.

Few points:

Use context manager (with statement) to open files. When the with statement ends, the file will automatically be closed.
An alternative to re.findall() is re.match() or re.search(). Subsequent code will be slightly different.

It org, sequence and numbers are related element-wise, I suggest you maintain a list of 3-element tuples instead. Of course, you have buffer the org field and add to the list of tuples when the next line is obtained.

import re

org = []
sequence = []
numbers = []

with open('ddd', 'r') as f:
    for line in f.readlines():
        line = line.strip()
        if re.search(r'^>', line):
            org.append(line)
        else:
            m = re.findall(r'^([A-Z]+--)\s+(.*)\s+', line)
            if m:
                sequence.append(m[0][0])
                numbers.append(map(int, m[0][1].split())) # convert from str to int

print(org, sequence, numbers)

How to remove empty separators from read files in Python?

3 Answers3