Most pythonic way to process this text file using python

Question

I have a text file containg test data that looks like:

hdr 1

hdr2

hdr3

data1

data2

data3

data1

data2

....

There is a blank line between every line.

I need to create a list of lists containing

[[hdr1,hdr2,hdr3],[data1,data2,data3],[data1,data2,...]

What would be a concise, pythonic way of doing this?

What is the rule that determines where one group ends and the next begins? — Karl Knechtel, Feb 02 '13 at 04:17

score 7 · Answer 1 · edited May 23 '17 at 11:48

7

Assuming your data will always be in blocks of 3 like in your example, you could probably use itertools, and if you pass the same iterator 6 times then you should end up with 6 fields, 3 of them the empty lines, and the other ones the group of 3 you're interested in:

import itertools

arr = []
with open('input.txt') as f:
    for field1,blank1,field2,blank2,field3,blank3 in itertools.izip_longest(*[f]*6):
        arr.append([field1,field2,field3])

(inspired from this post)

EDIT: You may need to call strip() on the fields so as to not include any extra \n, so arr.append([field1.strip(),field2.strip(),field3.strip()])

edited May 23 '17 at 11:48

Community

1
1

answered Feb 02 '13 at 01:16

Charles Menguy

40,830
17
95
117

1

+1. I'd write it in terms of `grouper` (from the itertools recipes in the docs) rather than explicitly zipping `*[f]*6`, because that's pretty hard for most people to understand in the middle of a complex line. Also, you don't need to explicitly throw out the blank lines; just use `[::2]`. So, you can replace everything but the `with` line with `arr = [group[::2] for group in grouper(6, f)]`. – abarnert Feb 02 '13 at 01:50
For the sake of pythonicity, I'd do: `arr.append(map(lambda x:x.strip(), [field1, field2, field3])`, but that surely is a bit overkill... :P – heltonbiker Feb 02 '13 at 01:53
@heltonbiker: I don't think using `map` with a `lambda` makes things more Pythonic than either using a list comprehension or `map` with `str.strip`. But agreed that it's probably more Pythonic than doing it explicitly on three values—especially if you're using a slice or a `filter` to get those values. – abarnert Feb 02 '13 at 01:54

abarnert · Answer 2 · 2013-02-02T03:27:50.923

This is just a simplified version of Charles Menguy's solution, and I'm only adding it as an answer because it was hard to read as a comment. But here's the key:

First, use grouper from the itertools recipes to group the file into groups of 6 lines:

groups = grouper(6, f)

Next, you can throw out every other line just by slicing:

nonblank = [group[::2] for group in groups]

Or, alternatively, by filtering out the blank lines explicitly:

nonblank = [filter(bool, group) for group in groups]

If you need to strip each line, you can either use a list comprehension, or map. Generally, I prefer map if I don't need to lambda/partial up a new function, and here we don't; it's just map(str.strip, group).

Putting it together, here's the whole thing, as a one-liner (which I think is still pretty readable):

with open('input.txt') as f:
    arr = [map(str.strip, group[::2]) for group in grouper(6, f)]

Nice abstraction on top of `izip_longest` ! You got my vote for finding a way to make it more readable. — Charles Menguy, Feb 02 '13 at 02:02

score 0 · Answer 3 · answered Feb 02 '13 at 01:45

Don't know if it is the best solution or how much is it pythonic, but you can simply use regular expressions to parse the lines of your file:

import re

regex = re.compile(r'^(\w+)\s*(\d+)')
last_groups = None
group = []
data = []

with open('data.txt', 'r') as f:
    for line in f:
        match = regex.search(line)
        if match:
            if last_groups is None:
                last_groups = match.groups()

            if last_groups[0] == match.groups()[0] and \
                    int(last_groups[1]) <= int(match.groups()[1]):
                last_groups = match.groups()
                group.append(''.join(last_groups))
            else:
                data.append(group)
                last_groups = match.groups()
                group = [''.join(last_groups)]

if group:
    data.append(group)

Most pythonic way to process this text file using python

3 Answers3