0

I have a file, as shown below. I plan to create a list of lists with blocks data using itertools.groupby, but i am difficulty to figure out the key part to split the lines into blocks of lists.

Any idea ?

with open(infile) as f:
    blocks = []
    for key, val in itertools.groupby(f, lambda x:):
        if key:
            blocks.append(list(val))

Input:

Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : fabric
DataFields        : Zen
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : application
DataFields        : Flood1
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : fabric
DataFields        : Flood2
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : application
DataFields        : Flood3 

Output: should be list of lists

[list1, list2, list3, list4]

list1 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : fabric, DataFields        : Zen]
list2 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : application, DataFields        : Flood1]
list3 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : fabric, DataFields        : Flood2]
list4 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : application, DataFields        : Flood3]
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
lax123
  • 21
  • 7

1 Answers1

0

If you use groupby to key off of Timestamp, it will alternate between generators that produce timestamp lines and non-timestamp lines. You can use that to create new sublists and extend them with contained data.

import itertools

with open('test.txt') as f:
    blocks = []
    for is_timestamp, lines in itertools.groupby(
            (line.strip() for line in f), 
            lambda line: line.startswith('Timestamp')):
        if is_timestamp:
            # saw a timestamp - start a new inner list
            blocks.append(list(lines))
        else:
            # extend with not timestamp stuff
            blocks[-1].extend(list(lines))

for block in blocks:
    print(block)

Running a test, I get

td@mintyfresh ~/tmp $ python3 test.py
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : fabric', 'DataFields        : Zen']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : application', 'DataFields        : Flood1']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : fabric', 'DataFields        : Flood2']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : application', 'DataFields        : Flood3', '']
tdelaney
  • 73,364
  • 6
  • 83
  • 116