0

I'm looking into a way to find objects with Nokogiri based on the actual style in the stylesheet and not the CSS Selector. I found a related link that gives me an idea of how to do it:

Is it possible to parse a stylesheet with Nokogiri?

Basically I want to do something like this

html = Nokogiri::HTML(html_string)
css = CssParser::Parser.new

css.add_block!(#some method to grab all the code from the stylesheets)

#method to find all images larger than 600px wide
#method to find related text of those images

Also I was wondering if I need to use a library like Mechanize to crawl through the whole website beyond the link I use as a starting point. Thanks!

Community
  • 1
  • 1
Daniel
  • 13
  • 3
  • What are you specifically trying to do? If it needs to handle arbitrary websites, you'll run into problems because many sites these days apply styles via JavaScript. – Mark Thomas Jul 24 '14 at 17:11
  • Just weed out images that aren't logos or banners, but actually grab the content of the page and then the related text. I could check both the inline style and the css if necessary. I just need all the style information so I can intelligently scrape image content. – Daniel Jul 24 '14 at 22:08
  • You don't have to use Mechanize to crawl a site, you can write it yourself. Mechanize helps when dealing with forms, otherwise I find crawling sites easily done using OpenURI and Nokogiri. – the Tin Man Jan 05 '15 at 16:12

0 Answers0