Scraping a table with beautiful soup

Question

I'm trying to scrape the price table (buy yes, prices and contracts available) from this site: https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices.

This is my (obviously very preliminary) code, structured now just to find the table:

from bs4 import BeautifulSoup
import requests
from lxml import html
import json, re

url = "https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices"

ret = requests.get(url).text

soup = BeautifulSoup(ret, "lxml")

try:
    table = soup.find('table')
    print table
except AttributeError as e:
    print 'No tables found, exiting'

The code finds and parses a table; however, it's the wrong one (the data table on a different tab https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#data).

How do I resolve this error to ensure the code identifies the correct table?

Which table do you want? Your best bet is to use `soup.find_all('table')` and then iterate through the list it returns. When iterating through it, search for specific elements only the table you want has — TerryA, Jul 18 '17 at 22:16
@TerryA Ran that code and it didn't identify the desired table, just the table on the first tab. — libertyspursuit, Jul 18 '17 at 22:21
Odd, I seem to get the error `requests.exceptions.ConnectionError: ('Connection aborted.', error(54, 'Connection reset by peer'))` when I try `requests.get(url)` — TerryA, Jul 18 '17 at 22:30
I had the same problem, but followed the steps here https://stackoverflow.com/questions/38853972/python-client-error-connection-reset-by-peer and used a virtualenv to make it work — libertyspursuit, Jul 18 '17 at 22:34
unfortunately those tables are displayed due to Javascript, which BeautifulSoup cannot handle. See this answer: https://stackoverflow.com/a/44813994/1248974, namely the recommendation to use Selenium or another scraper — chickity china chinese chicken, Jul 18 '17 at 23:00

score 1 · Answer 1 · answered Jul 18 '17 at 23:07

1

As @downshift mentioned in the comments the table is js generated using xhr request.
So you can either use Selenium or make a direct request to the site's api.

Using the 2nd option:

url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=7069"
ret = requests.get(url).text
soup = BeautifulSoup(ret, "lxml")
table = soup.find('table')

answered Jul 18 '17 at 23:07

t.m.adam

15,106
3
32
52

Thank you for your help! – libertyspursuit Jul 19 '17 at 02:02

Scraping a table with beautiful soup

1 Answers1