Reading from a file in segments

Question

I have written a script that reads data from two different files and proceeds accordingly. However, when I wrote the script I was under the impression that the first file from which I am reading only has two lines, sadly this has since changed.

My code extracts the first two lines and passes the data to another function, which then proceeds to do the calculation by passing through multiple other functions.

Right now I am doing something like this:

try:
    file = open(myfile, 'r')
    for line in file:
        if line[0] != '|':
            name = line.strip('\n')
        else:
            data = line.strip('|\n')

The file, in general, looks like this:

Samantha
|j&8ju820kahu9|

Now, sadly, I can have a file that can have multiple lines, as follows:

Andy
|o81kujd0-la88js|
Mathew
|a992kma82nf-x01j4|
Andrew
|01ks83nnz;a82jlad|

Is there a way where I can extract two lines at a time from a file? Process them and then proceed to extract two more? So grab the first two lines, give them to name + data, which pass it to my function, eventually printing what is required, and then get the new two lines and so forth.

Please advice.

score 6 · Accepted Answer · answered Apr 17 '18 at 19:56

6

Yes, because the file context is also an iterator:

with open(filename, 'r') as f:
    for l1, l2 in zip(f, f):
        # ... do something with l1 and l2

This is the shortest and most pythonic way afaik.

answered Apr 17 '18 at 19:56

OneRaynyDay

3,658
2
23
56

2

This also is based on side effects and order of iterator evaluation of `zip`. It would make for a good puzzle to ask why it does not return pairs of the same lines. – 9000 Apr 17 '18 at 20:03
@9000 well put. – OneRaynyDay Apr 17 '18 at 20:05
@9000 and OneRaynyDay I was pleasantly surprised to figure out it did not. Could you please elaborate why :) ? – Srini Apr 17 '18 at 20:09
2

Note that this is well document and guaranteed. From the [documentation](https://docs.python.org/2/library/functions.html#zip): The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n). – TwistedSim Apr 17 '18 at 20:10
1

@Srini it is because `f` is an iterator, and when you use `for x in f`, it implicitly takes `next(f)` and assigns it to `x`. The calling of `next()` in a `zip` is the first iterator then the next, thus it calls `next()` twice on the same iterator giving you different values. – OneRaynyDay Apr 17 '18 at 20:10
1

Wow, that just blew my mind. That's such a useful thing to know. Thanks for elucidating OneRaynyDay :) ! That doc was useful too @TwistedSim – Srini Apr 17 '18 at 20:13
1

This is used in example code in the stdlib (the `grouper` recipe in `itertools`), so it's shouldn't exactly be considered "deep magic"; you should understand why it works if you're going to use it. There was a question about `grouper` about 5 years ago, but I can't find it. I can find [a blog post I wrote explaining it in more detail](http://stupidpythonideas.blogspot.com/2013/08/how-grouper-works.html), but that really shouldn't be necessary just to understand the pairs version. – abarnert Apr 17 '18 at 20:16

TwistedSim · Answer 2 · 2018-04-17T20:23:50.053

A solution for you might be:

data = {}
with open(filename) as f:
    for name, value in zip(f, f):
        data[name] = value

For an explanation on the zip function with iterators, look at the documentation.

Also, this is from the recipe on the itertools documentation:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

score 0 · Answer 3 · answered Apr 17 '18 at 20:07

Of course you can.

okay = False
with open(...) as f:
  while True:
    okay = False
    try:
      line_1 = next(f)
      line_2 = next(f)
      okay = True
      # ... do something with the pair of lines
    except StopIteration:
      break;  # End of file.
if not okay:
   complain("The file did not contain an even number of lines")

Srini · Answer 4 · 2018-04-17T20:05:46.750

you can use the list splice notation list[<begin>:<end>:<step>] to skip list elements when iterating. If your file is small, you can just read it one swoop into memory with readlines()

Consider something like this don't use file as the file handle. It shadows builtin file

In [9]: a = my_file.readlines()
In [10]: for i, line in enumerate(a[::2]):
   ...:     data_line = a[i+1]
   ...:     name = line.strip('\n')
   ...:     data = data_line.strip("|\n")
   ...:     print name
   ...:     print data
   ...:
Andy
o81kujd0-la88js
Mathew
Mathew
Andrew
a992kma82nf-x01j4

In [11]:

(I would personally do something like a regex match though).

AbtPst · Answer 5 · 2018-04-17T20:00:28.443

try this

from itertools import islice
with open(filename, 'r') as infile:
    current_slice = islice(infile, N)
for line in current_slice:
    print line

Where N is the number of lines you want to process and current_slice is a generator object, that gives you each line of the file and can be used in a loop. this should give you two lines at a time. instead of printing, you can perform your operations and then proceed to the next two lines

another option is

from itertools import izip_longest

with open(filename) as f:
     for lines in grouper(f, N, ''):
         for line in lines:
             # process N lines here

The first one doesn't seems to work. islice doesn't return multiple lines. — TwistedSim, Apr 17 '18 at 20:04

Reading from a file in segments

5 Answers5