Questions tagged [hpricot]

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

163 questions

votes

4 answers

How do I do a regex search in Nokogiri for text that matches a certain beginning?

Given: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "

Foo

Bar

ruby nokogiri hpricot

asked Oct 12 '09 at 18:00

bcolfer

votes

3 answers

Nokogiri vs Hpricot?

Which one would you choose? My important attributes are (not in order): Support and future enhancements. Community and general knowledge base (on the Internet). Comprehensive (I.E., proven to parse a wide range of *.*ml pages). Performance. Memory…

ruby nokogiri html-parsing hpricot

asked May 22 '10 at 15:05

roshan

1,323
18
31

votes

5 answers

Installing Hpricot on Ruby 1.9.1 on Windows

I am trying to install hpricot using the command: >gem install hpricot -v 0.8.2 Building native extensions. This could take a while... ERROR: Error installing hpricot: ERROR: Failed to build gem native extension. C:/Ruby19/bin/ruby.exe…

ruby hpricot

asked Nov 11 '09 at 22:24

Marcus

votes

1 answer

open-uri is not redirecing http to https

I am using Hpricot and OpenURI to parse webpages and extract URLs from them. When I get a link like "http:rapidshare.com", it is not redirecting to https. This is the error I…

ruby rubygems hpricot open-uri

asked Apr 04 '12 at 14:32

leonidus

votes

4 answers

Strip text from HTML document using Ruby

There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly. What I am trying to do is the opposite, remove all the text from an HTML…

html ruby nokogiri hpricot

asked Sep 30 '09 at 11:06

davidsmalley

1,029
3
10
15

votes

2 answers

Using Ruby with Mechanize to log into a website

I need to scrape data from a site, but it requires my login first. I've been using hpricot to successfully scrape other sites, but I'm new to using mechanize, and I'm truly baffled by how to work it. I see this example commonly quoted: require…

ruby authentication screen-scraping mechanize hpricot

asked Jul 08 '11 at 19:39

Spacew00t

votes

3 answers

Can any of Ruby's HTML Parsers do JavaScript to see the resulting DOM?

When trying Hpricot and Nokogiri, the HTML can be fetched and parsed, but can they also execute the Javascript as well so that the content shows on the page? (shows up in the the DOM). That's because some page won't show the info unless the…

javascript ruby html-parsing nokogiri hpricot

asked Sep 05 '10 at 19:11

nonopolarity

146,324
131
460
740

votes

2 answers

What is a "terminated object", and why can't I call methods on it?

Periodically I get this exception: NotImplementedError: method `at' called on terminated object on this line of code: next if Hpricot(html).at('a') What does this error mean? How can I avoid it?

ruby hpricot

asked Jan 13 '11 at 06:41

Tom Lehman

85,973
71
200
272

votes

1 answer

Convert HTML to plain text and maintain structure/formatting, with ruby

I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for
tags, detecting paragraphs and formatting them as such, etc. The…

ruby html-parsing nokogiri beautifulsoup hpricot

asked May 20 '11 at 14:39

John Bachir

22,495
29
154
227

votes

2 answers

Ruby Mechanize table scraping doesn't capture entire row

I am trying to scrape a table website with mechanize. I want to scrape the second row. When I run : agent.page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text } I would expect it to scrape the whole row. But instead it only scrapes:…

ruby-on-rails ruby-on-rails-3 nokogiri hpricot mechanize-ruby

asked Feb 17 '11 at 00:11

Rails beginner

14,321
35
137
257

votes

3 answers

how does one remove tags from around text in XML using Hpricot?

i just want the text out of there with out those tags. Does Hrpicot.XML have any methods for this?

ruby xml hpricot xml-parsing

asked Aug 22 '10 at 19:13

loosecannon

7,683
3
32
43

votes

1 answer

Where can I find Hpricot documentation?

Now that http://github.com/why/hpricot/wikis/home no longer exists.

ruby hpricot

asked Sep 06 '09 at 17:45

Tom Lehman

85,973
71
200
272

votes

2 answers

Failing to extract html table rows

I try to extract all five rows listed in the table above. I'm using Ruby hpricot library to extract the table rows using xpath expression. In my example, the xpath expression I use is /html/body/center/table/tr. Note that I've removed the tbody tag…

html ruby xpath web-scraping hpricot

asked Nov 20 '11 at 21:11

Terry Li

16,870
30
89
134

votes

5 answers

Looking for a recommendation of a good tutorial on best practices for a web scraping project?

I need to do a fairly extensive project involving web scraping and am considering using Hpricot or Beautiful Soup (i.e. Ruby or Python). Has anyone come across a tutorial that they thought was particularly good on this subject that would help me…

python ruby screen-scraping beautifulsoup hpricot

asked Mar 26 '09 at 05:31

in bruges

votes

1 answer

CSS selector exclude elements, hpricot

I am trying to write a CSS selector that select everything except the script elements with hpricot, I can easily select the all the contents of the select-me div and then remove the script elements but I was wondering if its possible to use a…

css ruby selector hpricot

asked May 04 '10 at 15:41

RailsSon

19,897
31
82
105

2 3

…

10 11 Next