0
require 'rubygems'
require 'nokogiri'
require 'mechanize'

agent = Mechanize.new

page = agent.get('https://www.instagram.com/accounts/login/')
forms = page.forms.first
pp form

I am trying to locate the form to login to the instagram website. I cannot seem to get mechanize to locate the form even though it should be the only one on the page. When I pretty print the page I get back blank output.

Ben Gitter
  • 31
  • 1

1 Answers1

1

This page uses Javascript to render the form, which mechanize doesn't run. If you want to see what a page looks like without Javascript, you can open it with the lynx browser.

Selenium can be used instead. After installing a driver such as for chrome (see here), the API is pretty similar:

driver = Selenium::WebDriver.for :chrome
driver.navigate.to "https://www.instagram.com/accounts/login/"
first_form = driver.find_elements(css: "form")[0]
max pleaner
  • 26,189
  • 9
  • 66
  • 118
  • If I use the selenium webdriver to authenticate can I use nokogiri to send further requests to the server and parse them while maintaining my login session? Trying to use a multithreaded approach to scraping. – Ben Gitter May 24 '17 at 08:15
  • @BenGitter although it can be run without GUI using the `headless` gem, and also wrapped into a Thread so as to not block the main thread, according to [here](https://stackoverflow.com/questions/30808606/can-selenium-use-multi-threading-in-one-browser) the driver is not thread safe, so your best bet would be to launch _multiple drivers_ to run parallel tests. Haven't used it myself, but [Selenium Grid](http://www.seleniumhq.org/docs/07_selenium_grid.jsp) seems a promising tool. – max pleaner May 24 '17 at 16:31