-1

I am trying to extract the names from a block of text, since there are only few names that can ever occur it is quite easy to just preconstruct list of names and I would like to match them in a text. For example, I have the following list:

names = [ "Wim Duisenberg", "Jean-Claude Trichet", "Mario Draghi", "Christine Lagarde"]

And the following block of text that is scraped via beautiful soup:

print(textauthors)
<h2 class="ecb-pressContentSubtitle">Mario Draghi, President of the ECB, <br/>Vítor Constâncio, Vice-President of the ECB, <br/>Frankfurt am Main, 20 October 2016</h2>

I tried the following solution (based on this answer on stack overflow):

def exact_Match(textauthors, names):
b = r'(\s|^|$)' 
res = return re.match(b + word + b, phrase, flags=re.IGNORECASE)
print(res)

It gives me an error of incorrect syntax and I am not sure how to solve it. Also let me in advance apologize if there is already answer for this somewhere on stack overflow, I am python beginner and I am not really sure how to even search for the right question. When I search for matching of names I see answers which try to do it with nltk but that is not really appropriate for me where I want to get exact match and when I try to search for match based on string text I cant find the answer that would work for me.

1muflon1
  • 209
  • 1
  • 7
  • 2
    The syntax error should be pointing exactly what the issue is: `res = return ` makes no sense. Either assign or return. – Masklinn Jan 16 '20 at 10:31
  • 1
    In addition, you probably shouldn't be using regex directly against HTML. – Tim Biegeleisen Jan 16 '20 at 10:33
  • A duplicate of [Match a whole word in a string using dynamic regex](https://stackoverflow.com/questions/29996079/match-a-whole-word-in-a-string-using-dynamic-regex) – Wiktor Stribiżew Jan 16 '20 at 13:29
  • @WiktorStribiżew I actually found that question but it did not worked for me. Maybe it’s because it uses python 2 and I use 3 or maybe it’s because I miss-applied the code from there. Before this post I actually tried a lot of other SE answers. But If you think this is still duplicate then vote to close it. – 1muflon1 Jan 16 '20 at 13:37
  • 1
    The solution there is for Python 3, too. See https://ideone.com/zUTj2o – Wiktor Stribiżew Jan 16 '20 at 13:52
  • @WiktorStribiżew oh okay then I don’t understand what I did wrong... then please close this. I don’t want to delete it so people who got upvotes don’t loose their points – 1muflon1 Jan 16 '20 at 13:53

1 Answers1

0

This will give you authors from textauthors:

import re

textauthors = '<h2 class="ecb-pressContentSubtitle">Mario Draghi, President of the ECB, <br/>Vítor Constâncio, Vice-President of the ECB, <br/>Frankfurt am Main, 20 October 2016</h2>'
regex = r">(?P<name>[^\s]+\s[^\s]+),"
matches = re.findall(regex, textauthors)
print(matches) # ['Mario Draghi', 'Vítor Constâncio']

of course if you need to extract authors from your textauthors

alex2007v
  • 1,230
  • 8
  • 12