Extract substring with regular expression, always None of re.match()

Question

I would like to extract some information from a string by regex, but the result is always None. The source code is as follows:

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
x = re.match(r'property=".+?"',line)
print(x)

I want to extract content and property tuples, how can I fix it?

[`re.match`](https://docs.python.org/3/library/re.html#re.match) matches only at *the beginning* of a string. You probably want to use `re.find` — Arne, Mar 26 '19 at 07:57
If it is an XML why are you going with the regex? Go for something like `lxml` or `beautifulsoup` — DirtyBit, Mar 26 '19 at 07:58
Possible duplicate of [Why does Java regex "matches" vs "find" get a different match when using non-greedy pattern?](https://stackoverflow.com/questions/24681553/why-does-java-regex-matches-vs-find-get-a-different-match-when-using-non-gre) — Pushpesh Kumar Rajwanshi, Mar 26 '19 at 07:58
@PushpeshKumarRajwanshi That's a question about Java not python. In python there is no `find` method. — Giacomo Alzetta, Mar 26 '19 at 08:00
@Arne `re.find` does not exist. You probably meant `re.search`. — Giacomo Alzetta, Mar 26 '19 at 08:00
@GiacomoAlzetta: I tried to tell not to use `match` which expects the regex to match full string, rather use `search` which finds the regex anywhere in the string and while finding duplicates, even though I tried python version of same but didn't get. Anyway removed that duplicate. — Pushpesh Kumar Rajwanshi, Mar 26 '19 at 08:02
Try `re.search("content\=\\\"(?P.*)\\\".*property\=\\\"(?P.*)\\\"\/\>",line)`. — YusufUMS, Mar 26 '19 at 08:03
You should use `search` instead of `match` as `match` will only work if whole string matches your regex. Try this python code, `line = '' x = re.search(r'property=".+?"',line) print(x.group())` — Pushpesh Kumar Rajwanshi, Mar 26 '19 at 08:04

score 0 · Answer 1 · answered Mar 26 '19 at 08:04

I would suggest something more suitable.

Using beautifulsoup:

from bs4 import BeautifulSoup

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
soup = BeautifulSoup(line, 'lxml')

print("Content: {}".format(soup.meta["content"]))
print("Property: {}".format(soup.meta["property"]))

OUTPUT:

Content: Allrecipes
Property: og:site_name

score 0 · Accepted Answer · answered Mar 26 '19 at 08:08

The answer from @DirtyBit is better than using regex. But, if you still want to use regex, it may helps (RegexDemo):

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
regex = re.search("content=\\\"(?P<content>.*)\\\".*property=\\\"(?P<prop>.*)\\\"\/>",line)
print (regex.groups())

Output:

('Allrecipes', 'og:site_name')

Extract substring with regular expression, always None of re.match()

2 Answers2