How can I repeatedly parse text in a text file between two strings?

Question

I have a text file the contains a table like the following:

---
Title of my file
Subtitle of my file
---

+------+-------------------+------+
|  a   |        aa         | aaa  |
|  b   |        bb         | bbb  |
|  c   |        cc         | ccc  |
|  d   |        dd         | ddd  |      # Section 1
|  e   |        ee         | eee  |
|  f   |        ff         | fff  |
+======+===================+======+
|  g   |        gg         | ggg  |
|  h   |        hh         | hhh  |
|  i   |        ii         | iii  |      # Section 2
|  j   |        jj         | jjj  |
|  k   |        kk         | kkk  |
|  l   |        ll         | lll  |
+------+-------------------+------+

And I'm trying parse with python to capture each section into a separate list, section1_list and section_2_list, with each list containinng the lines in the section. For example, section_1_list would be:

section_1_list = [
    "|  a   |        aa         | aaa  |",
    "|  b   |        bb         | bbb  |",
    "|  c   |        cc         | ccc  |",
    "|  d   |        dd         | ddd  |",
    "|  e   |        ee         | eee  |",
    "|  f   |        ff         | fff  |"
]

Notice that this is without the diving lines.

So my question is: how can I write my loop so that that I can ignore the dividing lines and gather the others into their own list?

**What I have tried:

Extract Values between two strings in a text file using python

Python read specific lines of text between two strings

**What I currently have:

with open(txt_file_path) as f:
    lines = f.readlines()

row_start = False

for line in lines:
    if "-----" in line or "=====" in line:
        block_text = []
        row_start = not row_start

    while row_start == True:
        block_text.append(line)

Edit: I say repeatedly in the title because I have around 16 of these blocks in the text file.

Eshwar S R · Answer 1 · 2021-09-29T08:32:54.583

Try the following approach.

Read the contents of the file.
Replace the first and last lines of the table (using re)
Split the data based on the line separators in the table (using re)
Split each block on new line to get the intended list.

See the following code:

import re
with open(txt_file_path,"r") as f:
    data = f.read()
    data = re.sub(r"[-+]+","",data)
    block_text = re.split(r"[+=]+",data)
    block_text = [text.split("\n") for text in block_text]

score 0 · Accepted Answer · answered Sep 28 '21 at 22:39

Here's how I would do:

from pprint import pprint

file_contents = """\
---
Title of my file
Subtitle of my file
---

+------+-------------------+------+
|  a   |        aa         | aaa  |
|  b   |        bb         | bbb  |
|  c   |        cc         | ccc  |
|  d   |        dd         | ddd  |      # Section 1
|  e   |        ee         | eee  |
|  f   |        ff         | fff  |
+======+===================+======+
|  g   |        gg         | ggg  |
|  h   |        hh         | hhh  |
|  i   |        ii         | iii  |      # Section 2
|  j   |        jj         | jjj  |
|  k   |        kk         | kkk  |
|  l   |        ll         | lll  |
+------+-------------------+------+\
"""
lines = file_contents.split('\n')

# TODO update as needed
start_end_line_prefixes = ('+---', '+===')

sections = []
curr_section = None

for line in lines:
    if any(line.startswith(prefix) for prefix in start_end_line_prefixes):
        curr_section = []
        sections.append(curr_section)
    elif curr_section is not None:
        curr_section.append(line)

# Remove empty list in last index (if needed)
if not sections[-1]:
    sections.pop()

pprint(sections)

Output:

[['|  a   |        aa         | aaa  |',
  '|  b   |        bb         | bbb  |',
  '|  c   |        cc         | ccc  |',
  '|  d   |        dd         | ddd  |      # Section 1',
  '|  e   |        ee         | eee  |',
  '|  f   |        ff         | fff  |'],
 ['|  g   |        gg         | ggg  |',
  '|  h   |        hh         | hhh  |',
  '|  i   |        ii         | iii  |      # Section 2',
  '|  j   |        jj         | jjj  |',
  '|  k   |        kk         | kkk  |',
  '|  l   |        ll         | lll  |']]

Wow thank you! I was trying to make that a lot more complex than it needed to be — , Sep 28 '21 at 22:48
No problem! I agree it could be optimized a bit, but just using the fact that lists are mutable types seems the simplest way to go here. — rv.kvetch, Sep 29 '21 at 13:24

How can I repeatedly parse text in a text file between two strings?

2 Answers2