2

I'm able to narrow in on the area of an HTML document using nokogiri. I need to be able to extract the href from the nokogiri object but I'm not able to figureout how to do this for the life of me. Calling row.css('td > b').to_html method gives me the pretty html representation in string form. But I need to parse this using nokogiri.

"<b><a href=\"/ShowTopic-g293766-i9284-k10224928-Tour_companies_for_botswana-Botswana.html\" onclick=\"setPID(34603)\">\ntour companies for botswana</a></b>"

The nokogiri equivalent that I'm unable to extract the url from is below:

[#<Nokogiri::XML::Element:0x3fe972a9deb8 name="b" children=[#<Nokogiri::XML::Element:0x3fe972ad90a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x3fe972ad8ff4 name="href" value="/ShowTopic-g317055-i11941-k10224606-United_Expeditions_tour_company_Maun-Maun_North_West_District.html">, #<Nokogiri::XML::Attr:0x3fe972ad8fe0 name="onclick" value="setPID(34603)">] children=[#<Nokogiri::XML::Text:0x3fe972ad8900 "\nUnited Expeditions tour company, Maun">]>]>]

The snippet above is a confusing bit of nokogiri xml object I guess. But I just want to get the href. How the heck do I do this?

Horse Voice
  • 8,138
  • 15
  • 69
  • 120

1 Answers1

3
row.css('td > b a').attr('href')

This should do the work. Read more about How to access attributes using Nokogiri.

Community
  • 1
  • 1
XY L
  • 25,431
  • 14
  • 84
  • 143
  • I tried the same idea but holly molly is this framework annoying. With the above suggestion I get the below error: `TripAdvisorParserTest#test_getSubforumPageThreads: NoMethodError: undefined method `attribute' for nil:NilClass /Users/imtiazahmad/.rvm/gems/ruby-2.1.2/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:164:in `attr' ` – Horse Voice Feb 23 '17 at 03:59
  • 1
    realized the best way to deal with the sucker is using to_h method which turns it into a hash first and then deal with the sucker in ruby directly – Horse Voice Feb 23 '17 at 04:25