-1

I am new in regular expression matching, i have string like below

"karthika has symptoms cold,cough her gender is female and his age is 45"

In the first string matching i will check for the keyword "symptoms" and select the immediate next word of the keyword like below:

regexp = re.compile("symptoms\s(\w+)")
symptoms = regexp.search(textoutput).group(1)

This will give the symptoms value as "cold" but i have multiple symptoms present in text so in second step i need to check in text after "cold" if there is comma(,) present, if comma present means i need to print the value immediate after comma i,e "cough" using regular expression.

Please help me to achieve this..

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Shiva Raj
  • 29
  • 5

2 Answers2

2

You can use a regex that finds the first word after 'symptoms' with optionally more matches that start with a komma, mabye spaces and more wordcharacters:

import re

pattern = r"symptoms\s+(\w+)(?:,\s*(\w+))*"
regex = re.compile(pattern)

t = "kathy has symptoms cold,cough her gender is female. john's symptoms  hunger, thirst."
symptoms = regex.findall(t)

print(symptoms)

Output:

[('cold', 'cough'), ('hunger', 'thirst')]

Explanation:

r"symptoms\s+(\w+)(?:,\s*(\w+))*"
# symptoms\s+                      literal symptoms followed by 1+ whitepsaces 
#            (\w+)                 followed by 1+ word-chars (first symptom) as group 1
#                 (?:,        )*   non grouping optional matches of comma+spaces
#                        (\w+)     1+ word-chars (2nd,..,n-th symptom) as group 2-n 

Alternate way:

import re

pattern = r"symptoms\s+(\w+(?:,\s*\w+)*(?:\s+and\s+\w+)?)"

regex = re.compile(pattern)

t1 = "kathy has symptoms cold,cough,fever and noseitch her gender is female. "
t2 = "john's symptoms  hunger, thirst."
symptoms = regex.findall(t1+t2)

print(symptoms)

Output:

['cold,cough,fever and noseitch', 'hunger, thirst']

This works for "british" english only - the amerikan way of

"kathy has symptoms cold,cough,fever, and noseitch" 

will only lead to cold,cough,fever, and as match.

You can split each individual match at ',' and " and " to get your single reasons:

sym = [ inner.split(",") for inner in (x.replace(" and ",",") for x in symptoms)] 
print(sym)

Output:

[['cold', 'cough', 'fever', 'noseitch'], ['hunger', ' thirst']]
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
1

You can use a regex capturing group For example,

# the following pattern looks for 
# symptoms<many spaces><many word chars><comma><many word chars>

s_re = re.compile(r"symptoms\s+\w+,(\w+)")

The full code is

import re
from typing import Optional

s_re = re.compile(r"symptoms\s+\w+,(\w+)")

def get_symptom(text: str) -> Optional[str]:
    found = s_re.search(text)

    if found:
      return found.group(1)
    return None
Mo...
  • 1,812
  • 2
  • 17
  • 22