How to extract meta description from urls using python?

Question

I want to extract the title and description from the following website:

view-source:http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/

with the following snippet of source code:

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

I want the title and meta content.

I used goose but it does not do a good job extracting. Here is my code:

website_title = [g.extract(url).title for url in clean_url_data]

and

website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data]

The result is empty

What about BeautifulSoup? - https://www.crummy.com/software/BeautifulSoup/ — Bubble Hacker, Jun 24 '16 at 09:28

score 22 · Accepted Answer · answered Jun 24 '16 at 10:17

22

Please check BeautifulSoup as solution.

For question above, you may use the following code to extract "description" info:

import requests
from bs4 import BeautifulSoup

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)

metas = soup.find_all('meta')

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]

output:

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']

answered Jun 24 '16 at 10:17

linpingta

2,324
2
18
36

1

You may want to add the check for content to exist in meta.attrs, as malformed html can cause exceptions to be thrown otherwise: `[ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and 'content' in meta.attrs and meta.attrs['name'] == 'description' ]` – Marius Tibeica Oct 08 '19 at 14:39
you may want to add () to print statement ) – Elvin Aghammadzada Feb 02 '21 at 20:12

score 1 · Answer 2 · answered Jun 24 '16 at 10:29

do you know html xpath? use lxml lib with xpath to extract html element is one fast way.

import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()

score 0 · Answer 3 · answered Jan 06 '21 at 09:23

0

import metadata_parser

page = metadata_parser.MetadataParser(url='www.xyz.com') metaDesc=page.metadata['og']['description'] print(metaDesc)

answered Jan 06 '21 at 09:23

ZealousWeb

1,647
1
10
11

While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Yunnosch Jan 07 '21 at 07:26

score 0 · Answer 4 · answered Feb 18 '21 at 10:45

0

You can use BeautifulSoup to achieve this.

Should be helpful -

metas = soup.find_all('meta') #Get Meta Description
for m in metas:
    if m.get ('name') == 'description':
        desc = m.get('content')
        print(desc)

answered Feb 18 '21 at 10:45

gm-123

248
3
16

How to extract meta description from urls using python?

4 Answers4