1

I am trying to extract the property "og" from opengraph from a website. What I want is to have all the tags that start with "og" of the document in a list.

What I've tried is:

soup.find_all("meta", property="og:")

and

soup.find_all("meta", property="og")

But it does not find anything unless I specify the complete tag.

A few examples are:

 <meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:url"/>,
 <meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:secure_url"/>,
 <meta content="text/html" property="og:video:type"/>,
 <meta content="1280" property="og:video:width"/>,
 <meta content="720" property="og:video:height"/>

Expected output would be:

l = ["og:video:url", "og:video:secure_url", "og:video:type", "og:video:width", "og:video:height"]

How can I do this?

Thank you

js352
  • 364
  • 2
  • 9
  • https://stackoverflow.com/questions/36768068/get-meta-tag-content-property-with-beautifulsoup-and-python, it may be help – Samsul Islam Feb 18 '21 at 20:26

3 Answers3

2

use CSS selector meta[property]

metas = soup.select('meta[property]')
propValue = [v['property'] for v in metas]
print(propValue)
uingtea
  • 6,002
  • 2
  • 26
  • 40
1

Is this what you want?

from bs4 import BeautifulSoup

sample = """
<html>
<body>
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:url"/>,
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:secure_url"/>,
<meta content="text/html" property="og:video:type"/>,
<meta content="1280" property="og:video:width"/>,
<meta content="720" property="og:video:height"/>
</body>
</html>
"""

print([m["property"] for m in BeautifulSoup(sample, "html.parser").find_all("meta")])

Output:

['og:video:url', 'og:video:secure_url', 'og:video:type', 'og:video:width', 'og:video:height']
baduker
  • 19,152
  • 9
  • 33
  • 56
1

You can check if og exist in property as follows:

...
soup = BeautifulSoup(html, "html.parser")

og_elements = [
    tag["property"] for tag in soup.find_all("meta", property=lambda t: "og" in t)
]

print(og_elements)
MendelG
  • 14,885
  • 4
  • 25
  • 52