After my OP, additional web searches found this: How do I use Python's itertools.groupby()?
Here is my current approach. Please advise if I can make it more Pythonic.
loadfile1.txt (no grouping variable - same output as loadfile4.txt):
pgm1
pgm2
pgm3
pgm4
pgm5
pgm6
pgm7
pgm8
/a/path/with spaces/pgm9
loadfile2.txt (random grouping variable):
10, pgm1
10, pgm2
10, pgm3
ZZ, pgm4
ZZ, pgm5
-5, pgm6
-5, pgm7
-5, pgm8
-5, /a/path/with spaces/pgm9
loadfile3.txt (same grouping variable - no dependencies - multi-threaded):
,pgm1
,pgm2
,pgm3
,pgm4
,pgm5
,pgm6
,pgm7
,pgm8
,/a/path/with spaces/pgm9
loadfile4.txt (different grouping variable - dependencies - single threaded):
1, pgm1
2, pgm2
3, pgm3
4, pgm4
5, pgm5
6, pgm6
7, pgm7
8, pgm8
9, /a/path/with spaces/pgm9
My Python script:
#!/usr/bin/python
# See https://stackoverflow.com/questions/4842057/python-easiest-way-to-ignore-blank-lines-when-reading-a-file
# convert file to list of lines, ignoring any blank lines
filename = 'loadfile2.txt'
with open(filename) as f_in:
lines = filter(None, (line.rstrip() for line in f_in))
print(lines)
# convert list to a list of lists split on comma
lines = [i.split(',') for i in lines]
print(lines)
# create list of lists based on the key value (first item in sub-lists)
listofpgms = []
for key, group in groupby(lines, lambda x: x[0]):
pgms = []
for pgm in group:
try:
pgms.append(pgm[1].strip())
except IndexError:
pgms.append(pgm[0].strip())
listofpgms.append(pgms)
print(listofpgms)
Output when using loadfile2.txt:
['10, pgm1', '10, pgm2', '10, pgm3', 'ZZ, pgm4', 'ZZ, pgm5', '-5, pgm6', '-5, pgm7', '-5, pgm8', '-5, /a/path/with spaces/pgm9']
[['10', ' pgm1'], ['10', ' pgm2'], ['10', ' pgm3'], ['ZZ', ' pgm4'], ['ZZ', ' pgm5'], ['-5', ' pgm6'], ['-5', ' pgm7'], ['-5', ' pgm8'], ['-5', ' /a/path/with spaces/pgm9']]
[['pgm1', 'pgm2', 'pgm3'], ['pgm4', 'pgm5'], ['pgm6', 'pgm7', 'pgm8', '/a/path/with spaces/pgm9']]