1

Here is my input file sample (z.txt)

>qrst
ABCDE--  6  6 35 25 10
>qqqq
ABBDE--  7  7 28 29  2

I store the alpha and numeric in separate lists. Here is the output of numerics list #Output : ['', '6', '', '6', '35', '25', '10'] ['', '7', '', '7', '28', '29', '', '2']

The output has an extra space when there are single digits because of the way the file has been created. Is there anyway to get rid of the '' (empty spaces)?

Rspacer
  • 2,369
  • 1
  • 14
  • 40
  • 3
    You could just use `sq.split()` which treats consecutive whitespace as a single delimiter so you won't end up with blank strings to get rid of... – Jon Clements Sep 13 '16 at 03:43

3 Answers3

1

You can take advantage of filter with None as function for that:

numbers = ['', '7', '', '7', '28', '29', '', '2']
numbers = filter(None, numbers)
print numbers

See it in action here: https://eval.in/640707

sal
  • 3,515
  • 1
  • 10
  • 21
1

If your input looks like this:

>>> li=[' 6  6  35  25  10', ' 7 7 28  29 2']

Just use .split() which will handle the repeated whitespace as a single delimiter:

>>> [e.split() for e in li]
[['6', '6', '35', '25', '10'], ['7', '7', '28', '29', '2']]

vs .split(" "):

>>> [e.split(" ") for e in li]
[['', '6', '', '6', '', '35', '', '25', '', '10'], ['', '7', '7', '28', '', '29', '2']]
dawg
  • 98,345
  • 23
  • 131
  • 206
0

I guess there are many ways to do this. I prefer using regular expressions, although this might be slower if you have a large input file with tens of thousands of lines. For smaller files, it's okay.

Few points:

  1. Use context manager (with statement) to open files. When the with statement ends, the file will automatically be closed.

  2. An alternative to re.findall() is re.match() or re.search(). Subsequent code will be slightly different.

  3. It org, sequence and numbers are related element-wise, I suggest you maintain a list of 3-element tuples instead. Of course, you have buffer the org field and add to the list of tuples when the next line is obtained.

    import re
    
    org = []
    sequence = []
    numbers = []
    
    with open('ddd', 'r') as f:
        for line in f.readlines():
            line = line.strip()
            if re.search(r'^>', line):
                org.append(line)
            else:
                m = re.findall(r'^([A-Z]+--)\s+(.*)\s+', line)
                if m:
                    sequence.append(m[0][0])
                    numbers.append(map(int, m[0][1].split())) # convert from str to int
    
    print(org, sequence, numbers)
    
coder.in.me
  • 1,048
  • 9
  • 19