How do I parse only quotes using BeautifulSoup?

Question

So I'm trying to parse quotes from a website, but within the Result class there are multiple paragraphs. Is there a way to ignore the date and author and only select the material in quotes? So I would only be left with a list of quotes? Using BeautifulSoup btw. Thanks.

<div class="result">
  <p><strong>Date:</strong> February 2, 2019</p>
  <p>"My mind had no choice but to drift into an elaborate fantasy realm."</p>

  <blockquote>
    <p class="attribution">&mdash; Pamela, Paul</p>
  </blockquote>
  <a href="/metaphors/25249" class="load_details">preview</a> |
  <a href="/metaphors/25249" title="Let Children Get Bored Again [from The New York Times]">full record</a>
  <div class="details_container"></div>
</div>
<div class="result">
  <p><strong>Date:</strong> February 2, 2019</p>
  <p>"You let your mind wander and follow it where it goes."</p>
  <blockquote>
    <p class="attribution">&mdash; Pamela, Paul</p>
  </blockquote>
  <a href="/metaphors/25250" class="load_details">preview</a> |
  <a href="/metaphors/25250" title="Let Children Get Bored Again [from The New York Times]">full record</a>

  <div class="details_container"></div>
</div>

My current code is here:

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('URLHERE').read()
soup = bs.BeautifulSoup(sauce,'lxml')

body = soup.body
for paragraph in body.find_all('p'):
    print(paragraph.text)

What do you mean by quotes? The text within the paragraph tag? — Kel Varnsen, Jul 10 '20 at 18:35
Material like: "You let your mind wander and follow it where it goes." — 257 Aria, Jul 10 '20 at 19:35
How disappointing that the markup puts the blocked quote in a `
` but the attribution in a `
`... — msanford, Jul 11 '20 at 02:26

DD_N0p · Answer 1 · 2020-07-10T19:12:42.463

1

you can use xpath for your query, for example:

import requests

from lxml import html

page = requests.get('enter_your_url')
tree = html.fromstring(page.content)
data = tree.xpath('//div[@class="result"]//p[2]/text()')

print(data)

edited Jul 10 '20 at 19:12

answered Jul 10 '20 at 18:36

DD_N0p

229
1
2
6

Ok great. I'm new to python how would I exactly do this? I have added my current code. – 257 Aria Jul 10 '20 at 19:04
Ok amazing thank you so much. Also is there any way to get each quote to be on a new line? It's spitting it out as a bulk text piece. – 257 Aria Jul 10 '20 at 19:19
for string type you can use \n – DD_N0p Jul 10 '20 at 19:41
How would you do that? Haha sorry very new to this. – 257 Aria Jul 10 '20 at 19:43
Would it be under the data section or a new line? – 257 Aria Jul 10 '20 at 19:47
you can read this [link](https://stackoverflow.com/questions/11497376/how-do-i-specify-new-lines-on-python-when-writing-on-files) – DD_N0p Jul 10 '20 at 19:50
I've tried a bunch of stuff but it hasn't worked. Could I get some help – 257 Aria Jul 10 '20 at 20:41
in data variable you get an array. If you want to print every element of array on a new line in console you can do like this `for i in data: print(i)` – DD_N0p Jul 10 '20 at 21:20
Dude I love you. Thank you so much. – 257 Aria Jul 10 '20 at 21:35

Kel Varnsen · Answer 2 · 2020-07-10T22:57:09.500

0

If I understand your question properly, you're looking to print just the quotes, which appear in every 3rd paragraph element, starting with the 2nd one.

quotes = soup.find_all('p')

for i in range(1, len(quotes), 3):
   print(quotes[i].text)

There may be a cleaner way of doing this, but that should work.

edited Jul 10 '20 at 22:57

answered Jul 10 '20 at 18:43

Kel Varnsen

314
2
8

Ok how would you get that to print? I'm new to BS and PY. – 257 Aria Jul 10 '20 at 18:57
I've got it to print out the first line, but how would I do this for a whole series of quotes? – 257 Aria Jul 10 '20 at 19:08
To print just do `print(quote)` and `print(attribution)`. I'd have to see how the HTML looks for the series of quotes if you edit your original question to add that. – Kel Varnsen Jul 10 '20 at 19:15
Ok perfect. I've updated the code to include 2 entries. There are about 100 per page. – 257 Aria Jul 10 '20 at 19:23
I updated the answer if you want to give it a try. Note that if that webpage changes (specifically with the
elements), the code likely won't work anymore.
– Kel Varnsen Jul 10 '20 at 22:58

How do I parse only quotes using BeautifulSoup?

2 Answers2