1

I am trying to get some comments off the car blog, Jalopnik. It doesn't come with the web page initially, instead the comments get retrieved with some Javascript. You only get the featured comments. I need all the comments so I would click "All" (between "Featured" and "Start a New Discussion") and get them.

To automate this, I tried learning Selenium. I modified their script from Pypi, guessing the code for clicking a link was link.click() and link = broswer.find_element_byxpath(...). It doesn't look liek the "All" button (displaying all comments) was pressed.

Ultimately I'd like to download the HTML of that version to parse.

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time

browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://jalopnik.com/5912009/prius-driver-beat-up-after-taking-out-two-bikers/") # Load page
time.sleep(0.2)

link = browser.find_element_by_xpath("//a[@class='tc cn_showall']")
link.click()
browser.save_screenshot('screenie.png')
browser.close()
john mangual
  • 7,718
  • 13
  • 56
  • 95

1 Answers1

1

Using Firefox with the Firebug plugin, I browsed to http://jalopnik.com/5912009/prius-driver-beat-up-after-taking-out-two-bikers.

I then opened the Firebug console and clicked on ALL; it obligingly showed a single AJAX call to http://jalopnik.com/index.php?op=threadlist&post_id=5912009&mode=all&page=0&repliesmode=hide&nouser=true&selected_thread=null

Opening that url in a new window gets me the comment feed you are seeking.

More generally, if you substitute the appropriate article-ID into that url, you should be able to automate the process without Selenium.

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99