Ruby webscrape script for GoDaddy

Question

I'm new to Ruby and for my first scripting assignment, I've been asked to write a web scraping script to grab elements of our DNS listings from GoDaddy.

Having issues with scraping the links and then I need to follow the links. I need to get the link from the "GoToSecondaryDNS" js element below. I'm using Mechanize and Nokogiri:

<td class="listCellBorder" align="left" style="width:170px;">
          <div style="padding-left:4px;">
            <div id="gvZones21divDynamicDNS"></div>
            <div id="gvZones21divMasterSlave" cicode="41022" onclick="GoToSecondaryDNS('iwanttoscrapethislink.com',0)" class="listFeatureButton secondaryDNSNoPremium" onmouseover="ShowSecondaryDNSAd(this, event);" onmouseout="HideAdInList(event);"></div>
            <div id="gvZones21divDNSSec" cicode="41023" class="listFeatureButton DNSSECButtonNoPremium" onmouseover="ShowDNSSecAd(this, event);" onmouseout="HideAdInList(event);" onclick="UpgradeLinkActionByID('gvZones21divDNSSec'); return false;" useClick="true" clickObj="aDNSSecUpgradeClicker"></div>
            <div id="gvZones21divVanityNS" onclick="GoToVanityNS('iwanttoscrapethislink.com',0)" class="listFeatureButton vanityNameserversNoPremium" onmouseover="ShowVanityNSAd(this, event);" onmouseout="HideAdInList(event);"></div>
            <div style="clear:both;"></div>
          </div>
        </td>

How can I scrape the link 'iwanttoscrapethislink.com' and then interact with the onclick to follow the link and scrape content on the following page with Ruby?

So far, I have a simple start to the code:

require 'rubygems'
require 'mechanize'
require 'open-uri'




def get_godaddy_data(url)


      web_agent = Mechanize.new

      result = nil

      ### login to GoDaddy admin


      page = web_agent.get('https://dns.godaddy.com/Default.aspx?sa=')

      ## there is only one form and it is the first form on thepage
      form = page.forms.first
      form.username = 'blank'
      form.password = 'blank'

      ## form.submit
      web_agent.submit(form, form.buttons.first)

     site_name = page.css('div.gvZones21divMasterSlave onclick td')  
      ### export dns zone data

      page = web_agent.get('https://dns.godaddy.com/ZoneFile.aspx?zone=' + site_name + '&zoneType=0&refer=dcc')
      form = page.forms[3]
      web_agent.submit(form, form.buttons.first).save(uri.host + 'scrape.txt')

       ## end

    end 

    ### read export file
    ##return File.open(uri.host + 'scrape.txt', 'rb') { |file| file.read }
  end


  def scrape_dns(url)

  site_name = page.css('div.gvZones21divMasterSlave onclick td') 
  LIST_URL = "https://dns.godaddy.com/ZoneFile.aspx?zone=" + site_name + '&zoneType=0&refer=dcc"
  page = Nokogiri::HTML(open(LIST_URL))

#not sure how to scrape onclick urls and then how to click through to continue scraping on the second page for each individual DNS

end

score 1 · Answer 1 · answered Aug 13 '12 at 18:44

You can't interact with "onclick" because Nokogiri isn't a JavaScript engine.

You can extract the contents and then use that as the URL for a subsequent web request. Assuming doc contains the parsed HTML:

doc.at('div[onclick^="GoToSecondaryDNS"]')['onclick']

will give you the value for the onclick parameter. ^= means "find the word starting with", so that lets us rule out other <div> tags with onclick parameters and returns:

"GoToSecondaryDNS('iwanttoscrapethislink.com',0)"

Using a simple regex [/'(.+)'/,1] will get you the hostname:

doc.at('div[onclick^="GoToSecondaryDNS"]')['onclick'][/'(.+)'/,1]
=> "iwanttoscrapethislink.com"

The rest, such as how to get access to Mechanize's internal Nokogiri document, and how to create the new URL, are left for you to figure out.

Thank you for getting me going in the right direction with this. I will see if I can at least get the link returned first and update this thread. — Lynn, Aug 14 '12 at 17:39

Ruby webscrape script for GoDaddy

1 Answers1