3

I want to copy some specific content from a website using ruby/rails. The content I need is inside a marquee html tag, divided by divs. How can I get access to this content using ruby? To be more precise - I want to use some kind of ruby gui (Preferably shoes). How do I do it?

Ariel
  • 2,638
  • 4
  • 23
  • 27
  • What are you trying to accomplish? Do you want to scrape another site and insert the contents into a database? Do you just want to display some remote content in a UI? – Intelekshual Mar 09 '11 at 18:44

2 Answers2

1

If I'm to understand correctly, you want a GUI interface to a website scraper. If that's so, you might have to build one yourself.

The easiest way to scrape a website is using nokogiri or mechanize gems. Basically, you will give those libraries the address of the website and then use their XPath capabilities to select the text out of the DOM.

https://github.com/sparklemotion/nokogiri

https://github.com/sparklemotion/mechanize (for the documentation)

Srdjan Pejic
  • 8,152
  • 2
  • 28
  • 24
1

This isn't really a Rails question. It's something you'd do using Ruby, then possibly display using Rails, or Sinatra or Padrino - pick your poison.

There are several different HTTP clients you can use:

Open-URI comes with Ruby and is the easiest. Net::HTTP comes with Ruby and is the standard toolbox, but it's lower-level so you'd have to do more work. HTTPClient and Typhoeus+Hydra are capable of threading and have both high-level and low-level interfaces.

I recommend using Nokogiri to parse the returned HTML. It's very full-featured and robust.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.example.com'))

puts doc.to_html

If you need to navigate through login screens or fill in forms before you get to the page you need to parse, then I'd recommend looking at Mechanize. It relies on Nokogiri internally so you can ask it for a Nokogiri document and parse away once Mechanize retrieves the desired URL.

If you need to deal with Dynamic HTML, then look into the various WATIR tools. They drive various web browsers then let you access the content as seen by the browser.

Once you have the content or data you want, you can "repurpose" it into text inside a Rails page.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303