-4

I would like to extract some information from a string by regex, but the result is always None. The source code is as follows:

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
x = re.match(r'property=".+?"',line)
print(x)

I want to extract content and property tuples, how can I fix it?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Tong
  • 23
  • 2

2 Answers2

0

I would suggest something more suitable.

Using beautifulsoup:

from bs4 import BeautifulSoup

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
soup = BeautifulSoup(line, 'lxml')

print("Content: {}".format(soup.meta["content"]))
print("Property: {}".format(soup.meta["property"]))

OUTPUT:

Content: Allrecipes
Property: og:site_name
DirtyBit
  • 16,613
  • 4
  • 34
  • 55
0

The answer from @DirtyBit is better than using regex. But, if you still want to use regex, it may helps (RegexDemo):

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
regex = re.search("content=\\\"(?P<content>.*)\\\".*property=\\\"(?P<prop>.*)\\\"\/>",line)
print (regex.groups())

Output:

('Allrecipes', 'og:site_name')
YusufUMS
  • 1,506
  • 1
  • 12
  • 24