Read specific sequence of lines in Python

Question

I have a sample file that looks like this:

    @XXXXXXXXX
    VXVXVXVXVX
    +
    ZZZZZZZZZZZ
    @AAAAAA
    YBYBYBYBYBYBYB
    ZZZZZZZZZZZZ
    ...

I wish to only read the lines that fall on the index 4i+2, where i starts at 0. So I should read the VXVXV (4*0+2 = 2)... line and the YBYB...(4*1 +2 = 6)line in the snippet above. I need to count the number of 'V's, 'X's,'Y's and 'B's and store in a pre-existing dict.

fp = open(fileName, "r")
lines = fp.readlines()

for i in xrange(1, len(lines),4):
    for c in str(lines(i)):
        if c == 'V':
             some_dict['V'] +=1

Can someone explain how do I avoid going off index and only read in the lines at the 4*i+2 index of the lines list?

That's not correct, unfortunately. – robinhood91 Mar 07 '16 at 00:38 — robinhood91, Mar 07 '16 at 00:38

aghast · Accepted Answer · 2016-03-07T01:59:19.813

2

Can't you just slice the list of lines?

lines = fp.readlines()
interesting_lines = lines[2::4]

Edit for others questioning how it works:

The "full" slice syntax is three parts: start:end:step

The start is the starting index, or 0 by default. Thus, for a 4 * i + 2, when i == 0, that is index #2.

The end is the ending index, or len(sequence) by default. Slices go up to but not including the last index.

The step is the increment between chosen items, 1 by default. Normally, a slice like 3:7 would return elements 3,4,5,6 (and not 7). But when you add a step parameter, you can do things like "step by 4".

Doing "step by 4" means start+0, start+4, start+8, start+12, ... which is what the OP wants, so long as the start parameter is chosen correctly.

edited Mar 07 '16 at 01:59

answered Mar 07 '16 at 00:45

aghast

14,785
3
24
56

In this case it is `lines[1::4]` but yeah, this works to grab the lines desired without using an index. – Ittociwam Mar 07 '16 at 00:57
heh, that's much more sensible than my double-listcomp. +1 – Paul Gowder Mar 07 '16 at 00:59
works like a charm! @Ittociwam could you please explain how this slicing translates to the logic? – robinhood91 Mar 07 '16 at 01:22
I am confused as to how this slicing looks at each 4*i+2 index. @Ittociwam – robinhood91 Mar 07 '16 at 01:25
Sorry, I meant to say it avoids having to "go off of an index" of a for loop. Assuming that is what op means. – Ittociwam Mar 07 '16 at 01:40
@AustinHastings lines[1::4] worked and I didn't try [2::4]. As per your reasoning, I am not sure why [1::4] works when it should be [2::4]. – robinhood91 Mar 08 '16 at 03:40
Remember that list/slice/array indexes start at 0, so line 0, line 1, line 2 would be the third line... – aghast Mar 08 '16 at 07:26

asdf · Answer 2 · 2016-03-07T00:42:10.077

0

You can do one of the following:

Start xrange at 0 then add 2 onto i in secondary loop

for i in xrange(0, len(lines), 4):
    for c in str(lines(i+2))
        if c == 'V':
            some_dict['V'] += 1

Start xrange at 2, then access i the way specified in your original program

for i in xrange(2, len(lines), 4):
    for c in str(lines(i))
        if c == 'V':
            some_dict['V'] += 1

edited Mar 07 '16 at 00:42

answered Mar 07 '16 at 00:37

asdf

2,927
2
21
42

Paul Gowder · Answer 3 · 2016-03-07T00:48:08.493

I'm not quite clear on what you're trying to do here--- are you actually just trying to only read the lines you want from disk? (In which case you've gone wrong from the start, because readlines() reads the whole file.) Or are you just trying to filter the list of lines to pick out the ones you want?

I'll assume the latter. In which case, the easiest thing to do would be to just use a listcomp to filter the line by indices. e.g. something simple like:

indices = [x[0] * 4 + 2 for x in enumerate(lines)]
filtered_lines = [lines[i] for i in indices if len(lines) > i]

and there you go, you've got just the lines you want, no index errors or anything silly like that. Then you can separate out and simplify the rest of your code to do the counting, just operating on the filtered list.

(just slightly edited the first list comp to be a little more idiomatic)

Technically, I only need to read the lines at 4*i+2 but I just avoided going for that optimization for the sake of simplicity. I shall try this listcomp. Thanks! — robinhood91, Mar 07 '16 at 00:42
note a couple of edits---the first listcomp is now more idiomatic, and the second has a bug removed. — Paul Gowder, Mar 07 '16 at 00:58

score -2 · Answer 4 · edited May 23 '17 at 12:07

-2

I already gave a similar answer to another question: How would I do this in a file?

A better solution (avoiding unnecessary for loops) would be

fp = open(fileName, "r")
def addToDict(letter):
    someDict[letter] += 1;

[addToDict('V') for 'V' in str(a) for a in fp.readlines()[2::4]];

I tried to make this an anonymous function without success, if someone can do that it would be excellent.

edited May 23 '17 at 12:07

Community

1
1

answered Mar 07 '16 at 00:43

Wer900

12
4

Can you explain this : [2::4]? – robinhood91 Mar 07 '16 at 00:45
Why use a list comprehension when you're not interested in the resulting list? Also, the function is completely unnecessary. `[somedict['V']+=1 ... ]` is much more pythonic. This doesn't avoid any for loops, you just can't see the for loop behind the list comprehension. Also, it seems like the OP is trying to increment the count of `'V'` every time it is matched, not once per line – asdf Mar 07 '16 at 00:45
1

Python will start reading the list from index 2, then every fourth element thereafter. It's known as slicing. The example that follows pertains to Numpy arrays, but it pertains just as well to regular lists: http://structure.usc.edu/numarray/node26.html – Wer900 Mar 07 '16 at 00:48
asdf: your suggested syntax throws an error for me in Python 2.7 (IDK if this has changed in Python 3). And yes, the for loop is hidden; there are no _explicit_ for loops in my suggestion; the list comprehension method is faster. EDIT: You're right, I didn't read the original code properly, the OP wants to add 1 every time a 'V' is read. Updating my answer. – Wer900 Mar 07 '16 at 00:50
@Wer900 List comprehensions are faster in the case where the for loop is actually building a list ([See here](http://stackoverflow.com/a/22108640/2766650)). However, in this case a list is not being built, we're merely appending into a dictionary. Accumulating useless values slows the runtime significantly. A list comprehension is actually *slower* than the equivalent for loop in this case. – asdf Mar 07 '16 at 00:53

Read specific sequence of lines in Python

4 Answers4