1

I am using Python and would like to match all the words after "Examination(s):" till one or more empty lines occur.

text = "Examination(s):\sMathematics 2nd Paper\r\n\r\nTimeTable"
text = "Examination(s):\r\n\r\nMathematics 2nd Paper\r\nblahblah"
text = "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"

In all the above examples, my output should be "Mathematics 2nd Paper". Here is what I tried:

import re
pat = re.compile(r'(?:Examination\(s\):)[^\r\n]*')
re.search(pat,text)

The above snippet works fine for example 2 (one occurrence of \r\n), but is not working for examples 1 and 3.

I am getting this error when i tried to apply your pattern @Wiktor

enter image description here

Updating the question to capture the missed scenario, it can be a space or newline after colon

[![enter image description here][2]][2]

dsj
  • 51
  • 1
  • 7

1 Answers1

1

To get the line after Examination(s): you can use

re.search(r'Examination\(s\):\s*([^\r\n]+)', text)

See the regex demo. Details:

  • Examination\(s\): - a literal Examination(s): string
  • \s* - zero or more whitespaces
  • ([^\r\n]+) - Group 1: one or more chars other than CR and LF chars.

See the Python demo:

import re
texts = ["Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable",
    "Examination(s):\r\nMathematics 2nd Paper\r\nblahblah",
    "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"]
 
for text in texts:
    m = re.search(r'Examination\(s\):\s*([^\r\n]+)', text)
    print(f'--- {repr(text)} ---')
    if m:
        print(m.group(1))

Output:

--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\nblahblah' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks' ---
Mathematics 2nd Paper
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I am getting an error, uploaded the screenshot of the error in my question. Thank you. – dsj Feb 19 '22 at 11:35
  • 1
    @dsj Because there is no match, in that string, you do not have a line break after `:` and all the strings in the question contain a line break there. Replace `\r?\n`, or `[\r\n]+`, with `(?:\r?\n)?` or a more generic `\s*` then. – Wiktor Stribiżew Feb 19 '22 at 11:38
  • Apologies yes you are right, it worked now, but yes it will have space in some text, newline in some text, does \s* works for all that cases? – dsj Feb 19 '22 at 11:41
  • 1
    @dsj `\s*` matches zero or more whitespaces, all Unicode whitespaces. – Wiktor Stribiżew Feb 19 '22 at 11:43
  • Ah yes, it fails whenever there is newline character in there, how do I match both spaces and newlines here? Because some examples will have space, some will have one newline, some will have 2 new lines. – dsj Feb 19 '22 at 11:45
  • @dsj See a [Whitespace](https://en.wikipedia.org/wiki/Whitespace_character) wiki article. – Wiktor Stribiżew Feb 19 '22 at 11:47
  • It works, thank you. updated the examples in the question to captured missed ones. Apologies, Thanks for pointing out. – dsj Feb 19 '22 at 11:51