0

We want to split a string multi line for example

|---------------------------------------------Title1(a)---------------------------------------------

Content goes here, the quick brown fox jumps over the lazy dog

|---------------------------------------------Title1(b)----------------------------------------------

Content goes here, the quick brown fox jumps over the lazy dog

here's our python split using regex code

import re

str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "" \
    "|---------------------------------------------Title1(b)----------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "|"

print(str1)

str2 = re.split("\|---------------------------------------------", str1)


print(str2)

We want the output to include only

str2[0]:

Content goes here, the quick brown fox jumps over the lazy dog

str2[1]:

Content goes here, the quick brown fox jumps over the lazy dog

what's the proper regex to use, or is there any other way to split using the format above

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
bherto39
  • 1,516
  • 3
  • 14
  • 29
  • Maybe `re.split(r'\s*^\|---.*\s*', text)`? You will still need to get rid of the first empty item though. Also, the `str1` does not contain line breaks in your code. – Wiktor Stribiżew Nov 23 '20 at 13:30
  • Maybe all you want is all non-blank lines not starting with `|---`? `str2 = [line for line in str1.splitlines() if not line.startswith('|---') and line.strip()]` – Wiktor Stribiżew Nov 23 '20 at 13:36
  • You could use `\|-+Title\d+\([a-z]\)-+(.+?)(?=\||$)` https://regex101.com/r/R6kwim/1 Then use re.findall and get the values by index if you want. See https://ideone.com/vHbRSa – The fourth bird Nov 23 '20 at 13:39
  • Or if there has to be a `|` at the end and a any other char than - for title and content `\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)`https://regex101.com/r/J501Ea/1 – The fourth bird Nov 23 '20 at 13:56
  • Does this answer your question? [Split string based on a regular expression](https://stackoverflow.com/questions/10974932/split-string-based-on-a-regular-expression) – Heo Nov 23 '20 at 13:59

1 Answers1

0

Instead of using split, you can match the lines and capture the part that you want in a group.

\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)

Explanation

  • \| Match |
  • -{2,} Match 2 or more -
  • [^-]+ Match 1+ times any char except -
  • -{2,} Match 2 or more -
  • ( Capture grou 1
    • [^-].*? match any char except -, then any char as least as possible
  • ) Close group 1
  • (?=\|) Positive lookahead, assert a | to the right

Regex demo | Python demo

Example

import re
 
regex = r"\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)"
 
str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "" \
    "|---------------------------------------------Title1(b)----------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "|"
 
str2 = re.findall(regex, str1);
print(str2[0])
print(str2[1])

Output

Content goes here, the quick brown fox jumps over the lazy dog
Content goes here, the quick brown fox jumps over the lazy dog

If Title should be part of the line, another option is to make the match a bit more precise.

\|-+Title\d+\([a-z]\)-+(.+?)(?=\||$)

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I still think regex is definitely overkill here. `str2 = [line for line in str1.splitlines() if not line.startswith('|---') and line.strip()]` looks to be [working well](https://ideone.com/9ISorj) for this type of content. – Wiktor Stribiżew Nov 23 '20 at 17:40
  • @Wiktor Stribiżew I will check later on, I am offline for the next hours. But your solutions look good :-) If you post it you have my vote. – The fourth bird Nov 23 '20 at 18:31