0

Background:

I want to use regular expressions to search for a keyword. However, my keyword has multiple synonyms. For example, the keyword positive can have the following words that I consider as equal to positive: "+", "pos", "POS", "Positive", "POSITIVE"

I've tried looking Create a dataframe with NLTK synonyms and http://www.nltk.org/howto/wordnet.html but I don't think it is quite what I am looking for

Goals:

1) create synonyms for a given keyword (e.g. positive)

2) search for a keyword (e.g. positive) in a corpus using regular expressions

Example:

toy_corpus = 'patient is POS which makes them ideal to treatment '

I think the steps to getting this would look something like this:

1) define synonyms for the positive e.g. positive = ["pos", "POS", "Positive", "POSITIVE", "+"]

2) use regular expression to find the keyword POS

Question

How do I go about achieving this?

1 Answers1

0

Try it:

import re
question = "patient is POS which makes them ideal to treatment. And the the positive"
find=["pos","POS","positive"]

words=re.findall("\n+",question)
result = [words   for words in find if words in question.split()]
print(result)
['POS', 'positive']

Where \n is a word boundary. Wiki: word boundary More examples: stackoverflow.com Best Regards!

Freddy Daniel
  • 369
  • 2
  • 16
  • quick q: what does "\n+" do? –  May 23 '19 at 18:30
  • Welcome to SO and thank you for your answer. Just to be clear, there's no need to "install" re--it's built into Python. Also, your solution doesn't really use regex for anything. You get the same result with `[w for w in find if w in question.split()]`. – ggorlen May 23 '19 at 18:32
  • thanks ggorlen, anyways EER you can also try with \b ..is a word boundary (https://www.regular-expressions.info/wordboundaries.html) Here you have more examples: https://stackoverflow.com/questions/37543724/python-regex-for-finding-all-words-in-a-string Best Regards! – Freddy Daniel May 23 '19 at 18:44
  • I looked at your link and still don't understand what the purpose of the `+` after `\n`? –  May 25 '19 at 15:53