-1

I need to filter the sentence and select only few terms from the whole sentence

For example, I have sample text:

ID: a9000006        
NSF Org     : DMI
Total Amt.  : $225024

Abstract    :This SBIR proposal is aimed at (1) the synthesis of new ferroelectric liquid crystals with ultra-high polarization,                    
             chemical stability and low viscosity
token = re.compile('a90[0-9][0-9][0-9][0-9][0-9]| [$][\d]+ |')
re.findall(token, filetext)

I get 'a9000006','$225024', but I do not know how to write regex for three upper case letter right after "NSF Org:" which is "DMI" and all text after "Abstract:"

Mark
  • 5,994
  • 5
  • 42
  • 55
Xichen Yao
  • 15
  • 2

2 Answers2

0

If you want to create a single regex which will match each of those 4 fields with explicit checks on each, then use this regex: :\s?(a90[\d]+|[$][\d]+|[A-Z]{3}|.*$)

>>> token = re.compile(r':\s?(a90[\d]+|[$][\d]+|[A-Z]{3}|.*$)', re.DOTALL)  # flag needed
>>> re.findall(token, filetext)
['a9000006', 'DMI', '$225024', 'This SBIR proposal is aimed at (1) the synthesis of new ferroelectric liquid crystals wi
th ultra-high polarization,                    \n             chemical stability and low viscosity']
>>> 

However, since you're searching for all at the same time, would be better to use one which matches all 4 together and generically, such as the one in this answer here.

aneroid
  • 12,983
  • 3
  • 36
  • 66
  • Thank you for answering my question, could you tell me what should I write regex if I just want "DMI" after "NSF Org:"? – Xichen Yao Feb 18 '19 at 18:54
  • That part in the regex is `[A-Z]{3}`. If you want the whole regex for just that part, then split the one I've given along the `|` (optional match conditions, which are rules matching each of the 4 fields). To match only 'DMI', use: `r':\s([A-Z]{3})'` – aneroid Feb 18 '19 at 19:25
  • Thanks a lot, but i use (r'NSF Org\s*:\s*(.*)') and it works as well – Xichen Yao Feb 18 '19 at 19:42
-1

This must do the job.

: .*

You can check this here. check

bumblebee
  • 1,811
  • 12
  • 19