0

I am building a script using Mechanize to scrape data from a website. The script is supposed to click on the "Read biography" link and then scrape the biography of the member on the next page.

Here is the script in the Rake file:

require 'mechanize'
require 'date'
require 'json'


task :testing2 do

    agent = Mechanize.new
    page = agent.get("https://www.congress.gov/members")

    page_links = page.links_with(href: %r{.*/member/\w+})


    member_links = page_links[0...2]

    members = member_links.map do |link|

      member = link.click

      name = member.search('title').text.split('|')[0]
      institution = member.search('td~ td+ td').text.split(':')[0]
      dob = member.search('.birthdate').text.strip[1..4]

      # Get bio
      bio_link = member.link_with(:text => 'Read biography').click
      bio = bio_page.search('p').text.strip

      {
        name: name.strip,
        institution: institution.strip,
        dob: dob,
        bio: bio

      }

    end

    puts JSON.pretty_generate(members)

end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Ryzal Yusoff
  • 957
  • 2
  • 22
  • 49

2 Answers2

0

There are two calls to click:

member = link.click

and

bio_link = member.link_with(:text => 'Read biography').click

The first is called on iterator, that can not be nil, hence the problematic one is the second.

Try to put debug output, or set a breakpoint before # Get bio and examine what’s wrong with it. It is impossible to say why member.link_with(:text => 'Read biography') returns nil by the information you have provided.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
0

The code you are using:

member.link_with(text: 'Read biography')

does not find the link, because the link has some space and new lines characters in it. You need to use it like this:

member.link_with(text: /Read biography/)

that code will find the link.

Ri1a
  • 737
  • 9
  • 26
fanta
  • 1,489
  • 13
  • 15