5

I am trying to scrape a table website with mechanize. I want to scrape the second row.

When I run :

agent.page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }

I would expect it to scrape the whole row. But instead it only scrapes: ["2011-02-17", "0,00"]

Why isn't it scraping all of the columns in the row, but just the first and the last column?

Xpath: /html/body/center/table/tbody/tr[2]/td[2]/table/tbody/tr[3]/td/table/tbody/tr[2]/td/table/tbody/tr[2]

CSS PATH: html body center table tbody tr td table tbody tr td table tbody tr td table.ea tbody tr td.total

The page is similar to this:

<table><table><table>
<table width="100%" border="0" cellpadding="0" cellspacing="1" class="ea">
<tr>
    <th><a href="#">Date</a></th>
    <th><a href="#">One</a></th>    
    <th><a href="#">Two</a></th>    
    <th><a href="#">Three</a></th>     
    <th><a href="#">Four</a></th>    
    <th><a href="#">Five</a></th>        
    <th><a href="#">Six</a></th>        
    <th><a href="#">Seven</a></th>      
    <th><a href="#">Eight</a></th>
</tr>
<tr>
    <td><a href="#">2011-02-17</a></td>
    <td align="right">0</td>    
    <td align="right">0</td>    
    <td align="right">0,00</td>     
    <td align="right">0</td>    
    <td align="right">0</td>        
    <td align="right">0</td>    
    <td align="right">0</td>        
    <td align="right">387</td>      
    <td align="right">0,00</td>     <!-- FOV -->
    <td align="right">0,00</td>
</tr>
<tr>
    <td class="total">Ialt</td>
    <td class="total" align="right">0</td>  
    <td class="total" align="right">40</td>     
    <td class="total" align="right">0,46</td>   
    <td class="total" align="right">2</td>      
    <td class="total" align="right">0</td>        
    <td class="total" align="right">0</td>      
    <td class="total" align="right">0</td>        
    <td class="total" align="right">3.060</td>      
    <td class="total" align="right">0,00</td>       
    <td class="total" align="right">18,58</td>
</tr>
</table>
</table></table></table>
blahdiblah
  • 33,069
  • 21
  • 98
  • 152
Rails beginner
  • 14,321
  • 35
  • 137
  • 257
  • I found out that the page did not have the columns. So instead of submitting a form. I did use the URL to access the right columns. – Rails beginner Feb 25 '11 at 13:14

2 Answers2

5

Using the following Ruby code (https://gist.github.com/835603):

require 'mechanize'
require 'pp'

a = Mechanize.new { |agent|
  agent.user_agent_alias = 'Mac Safari'
}

a.get('http://binarymuse.net/table.html') do |page|
  pp page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }
end

I get the following output:

["2011-02-17", "0", "0", "0,00", "0", "0", "0", "0", "387", "0,00", "0,00"]
Michelle Tilley
  • 157,729
  • 40
  • 374
  • 311
  • pp goes to false when required – Rails beginner Feb 20 '11 at 11:30
  • Are you in a Rails app? Anyway, `pp` is just the [Ruby Pretty Print library](http://www.ruby-doc.org/stdlib/libdoc/pp/rdoc/index.html); it shouldn't be necessary to get the example to work. Just remove the require and replace the call to `pp page...` to `puts page...` My point was simply that my output did not match the expected output based on your question. – Michelle Tilley Feb 20 '11 at 18:10
  • I am in a Rails app. I found out that the page did not have the columns. So instead of submitting a form. I did use the URL to access the right columns. – Rails beginner Feb 20 '11 at 18:30
0

I would recommend you to leave Mechanize to harder stuff than scraping a page. You can use Nokogiri much more simple than using Mechanize(but ofcourse you can do it with it) since you can just query the page.

Try it out!

here is a link to an answer regarding nokogiri

Personally I used Mechanize when I needed to send forms and stuff like that albeit there are tons of other uses to it!

Community
  • 1
  • 1
Cu7l4ss
  • 556
  • 1
  • 8
  • 19
  • 3
    Mechanize actually uses Nokogiri internally, and you can retrieve the `Nokogiri::HTML::Document` instance by calling `root()` (https://gist.github.com/844178) – Sébastien Le Callonnec Feb 25 '11 at 17:48
  • That I didnt know, :) but there isn't any point using Mechanize for scraping if you can do so with Nokogiri/What ever gem there is. At least from what the question states I see no point of using mechanize. – Cu7l4ss Feb 25 '11 at 19:24