Get meta tag content property with BeautifulSoup and Python

Question

I am trying to use python and beautiful soup to extract the content part of the tags below:

<meta property="og:title" content="Super Fun Event 1" />
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

I'm getting BeautifulSoup to load the page just fine and find other stuff (this also grabs the article id from the id tag hidden in the source), but I don't know the correct way to search the html and find these bits, I've tried variations of find and findAll to no avail. The code iterates over a list of urls at present...

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#importing the libraries
from urllib import urlopen
from bs4 import BeautifulSoup

def get_data(page_no):
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()
    soup = BeautifulSoup(webpage, "lxml")
    for tag in soup.find_all("article") :
        id = tag.get('id')
        print id
# the hard part that doesn't work - I know this example is well off the mark!        
    title = soup.find("og:title", "content")
    print (title.get_text())
    url = soup.find("og:url", "content")
    print (url.get_text())
# end of problem

for i in range (1,100):
    get_data(i)

If anyone can help me sort the bit to find the og:title and og:content that'd be fantastic!

score 103 · Accepted Answer · edited May 07 '21 at 07:50

103

Provide the meta tag name as the first argument to find(). Then, use keyword arguments to check the specific attributes:

title = soup.find("meta", property="og:title")
url = soup.find("meta", property="og:url")

print(title["content"] if title else "No meta title given")
print(url["content"] if url else "No meta url given")

The if/else checks here would be optional if you know that the title and url meta properties would always be present.

edited May 07 '21 at 07:50

Pikamander2

7,332
3
48
69

answered Apr 21 '16 at 11:42

alecxe

462,703
120
1,088
1,195

is there no built-in for get content, else fallback to default ? – Christophe Roussy Jun 12 '17 at 13:41
2

@ChristopheRoussy yup, this is exactly what is shown in the answer. Also, you can strengthen the `content` attribute presence by using `soup.find("meta", property="og:title", content=True)`. Thanks. – alecxe Jun 12 '17 at 13:42

score 31 · Answer 2 · answered Apr 21 '16 at 11:37

31

try this :

soup = BeautifulSoup(webpage)
for tag in soup.find_all("meta"):
    if tag.get("property", None) == "og:title":
        print tag.get("content", None)
    elif tag.get("property", None) == "og:url":
        print tag.get("content", None)

answered Apr 21 '16 at 11:37

Hackaholic

19,069
5
54
72

2

Two years later and this did exactly what I needed in getting value from one attribute of a meta tag based on the value of another attribute of the same tag. Thank you! – John Laudun Jun 04 '18 at 01:18

score 6 · Answer 3 · answered Jun 19 '20 at 13:22

A way I like to solve this is as follows:
(Is neater when using with lists of properties to look up...)

title = soup.find("meta",  {"property":"og:title"})
url = soup.find("meta",  {"property":"og:url"})

# Using same method as above answer
title = title["content"] if title else None
url = url["content"] if url else None

score 1 · Answer 4 · answered Oct 09 '20 at 22:25

You could grab the content inside the meta tag with gazpacho:

from gazpacho import Soup

html = """\
<meta property="og:title" content="Super Fun Event 1" />
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />
"""

soup = Soup(html)
soup.find("meta", {"property": "og:title"}).attrs['content']

Which would output:

'Super Fun Event 1'

score 1 · Answer 5 · answered Feb 21 '21 at 23:10

This code from Jinesh Narayanan: https://gist.github.com/jineshpaloor/6478011 is valid for this discussion.

from bs4 import BeautifulSoup
import requests
def main():
    r = requests.get('http://www.sourcebits.com/')
    soup = BeautifulSoup(r.content, features="lxml")

    title = soup.title.string
    print ('TITLE IS :', title)

    meta = soup.find_all('meta')

    for tag in meta:
        if 'name' in tag.attrs.keys() and tag.attrs['name'].strip().lower() in ['description', 'keywords']:
            # print ('NAME    :',tag.attrs['name'].lower())
            print ('CONTENT :',tag.attrs['content'])

if __name__ == '__main__':
    main()

Get meta tag content property with BeautifulSoup and Python

5 Answers5

Linked