2

I'm trying to scrape the price table (buy yes, prices and contracts available) from this site: https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices.

This is my (obviously very preliminary) code, structured now just to find the table:

from bs4 import BeautifulSoup
import requests
from lxml import html
import json, re

url = "https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices"

ret = requests.get(url).text

soup = BeautifulSoup(ret, "lxml")

try:
    table = soup.find('table')
    print table
except AttributeError as e:
    print 'No tables found, exiting'

The code finds and parses a table; however, it's the wrong one (the data table on a different tab https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#data).

How do I resolve this error to ensure the code identifies the correct table?

  • Which table do you want? Your best bet is to use `soup.find_all('table')` and then iterate through the list it returns. When iterating through it, search for specific elements only the table you want has – TerryA Jul 18 '17 at 22:16
  • @TerryA Ran that code and it didn't identify the desired table, just the table on the first tab. – libertyspursuit Jul 18 '17 at 22:21
  • What table do you want from the first link you gave? – TerryA Jul 18 '17 at 22:21
  • @TerryA this guy https://i.stack.imgur.com/21y42.png – libertyspursuit Jul 18 '17 at 22:25
  • Odd, I seem to get the error `requests.exceptions.ConnectionError: ('Connection aborted.', error(54, 'Connection reset by peer'))` when I try `requests.get(url)` – TerryA Jul 18 '17 at 22:30
  • I had the same problem, but followed the steps here https://stackoverflow.com/questions/38853972/python-client-error-connection-reset-by-peer and used a virtualenv to make it work – libertyspursuit Jul 18 '17 at 22:34
  • unfortunately those tables are displayed due to Javascript, which BeautifulSoup cannot handle. See this answer: https://stackoverflow.com/a/44813994/1248974, namely the recommendation to use Selenium or another scraper – chickity china chinese chicken Jul 18 '17 at 23:00
  • Thank you so much! – libertyspursuit Jul 19 '17 at 02:02

1 Answers1

1

As @downshift mentioned in the comments the table is js generated using xhr request.
So you can either use Selenium or make a direct request to the site's api.

Using the 2nd option:

url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=7069"
ret = requests.get(url).text
soup = BeautifulSoup(ret, "lxml")
table = soup.find('table')
t.m.adam
  • 15,106
  • 3
  • 32
  • 52