Need a python regular expression that can verify names with special characters(Hyphens, apostrophes, etc...)

Question

I am trying to create a python regular expression that can match any name. I am scraping a web page and looking for the <h1> tag and grabbing the name in between it. The names can include James Dean, James-Dean, Brian O'Quin, Jame Joe-Harden, etc...

This was the first regular expression I have been working with but it is not catching all the names

<h1>[A-Z]{1}[a-z]+\s[A-Z]{1}[']?[A-Z]?[-]?[A-Z]?[a-z]+

Don’t use regexp for HTML or [He Comes](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). — Eb946207, Dec 21 '18 at 21:05
Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Eb946207, Dec 21 '18 at 21:05
[H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Eb946207, Dec 21 '18 at 21:13
why not [parse with an html library](https://stackoverflow.com/q/11709079/1358308) then use an [xpath selector](https://stackoverflow.com/a/11466033/1358308) to match the appropriate tags — Sam Mason, Dec 21 '18 at 21:17
maybe you should explain better what you are looking for. You gave some examples, but didn't say *exactly* which characters you are trying to match — Leonardo Maffei, Dec 21 '18 at 21:36
@LeonardoMaffei I am looking inside of html and looking for something like this
name
example [link](http://www.espn.com/college-football/player/_/id/4360076/dylan-oquinn). I am trying to grab the player's name at the top of the page — Ethan Collins, Dec 21 '18 at 21:41

score -1 · Accepted Answer · answered Dec 21 '18 at 21:45

-1

Maybe this:

<h1>(([-'\w]+\s?)+)<h1>

Explaining:

the - matches itself, \w matches letters and numbers, and the plus is to capture one or more of these occurrences. Also, is optional a space character after this, to support composed names.
Finally, the last + plus ensures that you can repeat the structure I've just described.
Hope this help.

answered Dec 21 '18 at 21:45

Leonardo Maffei

352
2
6
16

This is kinda working and I have been doing some testing and this is where I'm at the regular expression I have is `
([-\'\w]+\s+\w+)` and when I try to extract the name **Dylan O'Quin** I return this **Dylan O** Any suggestions?
– Ethan Collins Dec 22 '18 at 19:13
just add a *+* after the last parentheses. Compare the result through (regex101.com)[regex101] and you will understand your mistake. Basically, this plus will *keep* looking for the pattern ```([-\'\w]+\s+\w+)``` over and over again – Leonardo Maffei Dec 22 '18 at 23:15
@LeonardiMaffei Thank you for your help! I ended up having to many troubles with regex and found a different solution with a html parser and BeautifulSoup – Ethan Collins Dec 23 '18 at 17:29
The last
in the answer needs to be
though. – Mr Lister Dec 23 '18 at 18:17
1

@MrLister In python regex the (/) in the closing header tag is an escape character. This is caused by any unescaped delimiter must be escaped with a backslash (\) and will break the pattern matching. As before I found that using an html parser with Beautiful soup was much much much easier lol – Ethan Collins Dec 24 '18 at 07:57

Need a python regular expression that can verify names with special characters(Hyphens, apostrophes, etc...)

name

1 Answers1

([-\'\w]+\s+\w+)` and when I try to extract the name Dylan O'Quin I return this Dylan O Any suggestions?

in the answer needs to be

Need a python regular expression that can verify names with special characters(Hyphens, apostrophes, etc...)

name

1 Answers1

([-\'\w]+\s+\w+)` and when I try to extract the name **Dylan O'Quin** I return this **Dylan O** Any suggestions?

in the answer needs to be

([-\'\w]+\s+\w+)` and when I try to extract the name Dylan O'Quin I return this Dylan O Any suggestions?