0

I have a string which looks like this :

 Q 1. What is your age?

Ans. 15.

Q 2. What is your name?

Ans. My name is Bond. 


My full name is James Bond

Q 3. What is your favorite subject?

and so on... The answers can have multiple paragraphs. The text generally looks like and so on.. I am trying to write a regex which gives me these pairs. I want to separate out the question answer pairs. I have a regex which outputs the questions but am unable to figure out how to get the pairs? Please guide. The regex for questions is :

p = re.findall("""Q [0-9]+[a-zA-Z]*\.(.*?)Ans\.""", checkText, re.S)
Dreams
  • 5,854
  • 9
  • 48
  • 71

1 Answers1

2

If you want to stay with regex, here's some solutions:

  1. Question only: Q \d\..*?\?(?=.*(?:Ans\.)?) (finds question without answer as well)
  2. Answer only: Ans\. .*?(?=\n\nQ \d\.)
  3. Question and answer (one pair): Q \d\. .*?(?=\n\nQ \d\.)

    Q "matches Q
    (space)
    \d "matches a digit
    \. "matches a dot
    (space)
    .*? "matches everything (also new lines, you must use Single Line option), lazy matching
    (?= "positive lookahead
        \n\n "matches two new lines
        Q \d\. "beginning of next question, same as before
    ) "end of lookahead
    

More about lazy (un-greedy) matching here

Demo here (improved to match last question as well)

Egan Wolf
  • 3,533
  • 1
  • 14
  • 29
  • Hey Egan, thats quite helpful, can you please explain the third point. That would be helpful for me to change it according to the corner cases in my strings! Thanks – Dreams Aug 09 '17 at 06:50
  • 1
    @Tarun I added some explanation. You can use site like regex101.com to learn how regex works. – Egan Wolf Aug 09 '17 at 07:09
  • Thanks a lof for the help. :) – Dreams Aug 09 '17 at 07:13