0

I have a text file which has multiple sections which are demarcated with a particular string. In the code so far, this section has been extracted as a list of separate lines.

The original file looks something like this:

>>

1. Title
Some data
Some data    
Some data

>>

2. Title
Some data
Some data    
Some data

>>

3. Title
Some data
Some data    
Some data

This is represented in a list several strings as I mentioned, so:

['>>', '1. Title', 'Some data, 'Some data', 'Some data', '>>', '2. Title', ... ]

What is the easiest way to subset this list in to separate entries as demarcated by the >>? There can be an arbitrary number of entries and they can differ in length, so using simple slicing notation isn't an option as far as I can work out - it has to depend on the demarcation within the list.

I'd like to end up with:

Entry 1:

['>>', '1. Title', 'Some data', 'Some data', 'Some data']

Entry 2:

['>>', '2. Title', 'Some data', 'Some data', 'Some data']

Entry 3:

['>>', '3. Title', 'Some data', 'Some data', 'Some data']

(I'm not actually concerned about collecting the >> once the lists are separated if that makes any difference.)

Joe Healey
  • 1,232
  • 3
  • 15
  • 34
  • Can the `>>` appear on actual text-data? – coder Jan 29 '18 at 10:03
  • No it appears line separated from the actual data and *shouldn't* ever appear within the body of the text (if that happens there are bigger problems!) – Joe Healey Jan 29 '18 at 10:04
  • 1
    Possible duplicate of [Python spliting a list based on a delimiter word](https://stackoverflow.com/questions/15357830/python-spliting-a-list-based-on-a-delimiter-word) – quamrana Jan 29 '18 at 10:15

1 Answers1

2

Just keep appending a sublist to a holding list:

full_list = ['>>', '1. Title', 'Some data', ...
final = []
sublist = [] # This list will initially absorb lines before the first >>
for line in full_list:
    if line == '>>':
        sublist = []
        final.append(sublist)
    else:
        sublist.append(line)

print(final)

Note: you will end up with an empty list at the end of the list if your input has a trailing >>.

quamrana
  • 37,849
  • 12
  • 53
  • 71
  • Excellent, this seems to work nicely! I'm surprised there isn't a module for something like this - doesn't feel as 'pythonic' as it could! There should be no issue with trailing `>>` (in fact at one point I was concerned I'd have to also test for the end of list/end of file, so this gets around that nicely). – Joe Healey Jan 29 '18 at 10:11
  • 1
    Ok, so now you have forced me to do some actual research and google stuff on Stackoverflow. And surprise surprise, there is an answer. See how I mark this question as duplicate..... – quamrana Jan 29 '18 at 10:15
  • Ah your google-fu exceeded mine in that case, I obviously couldn't concoct the right mix of 'string', 'list', 'split', 'delimiter'...! – Joe Healey Jan 29 '18 at 10:17