0

So, I have a table with multiple rows and columns.

<table>
  <tr>
    <th>Employee Name</th>
    <th>Reg Hours</th>
    <th>OT Hours</th>
  </tr>
  <tr>
    <td>Employee 1</td>
    <td>10</td>
    <td>20</td>
  </tr>
  <tr>
    <td>Employee 2</td>
    <td>5</td>
    <td>10</td>
  </tr>
</table>

There is also another table:

<table>
  <tr>
    <th>Employee Name</th>
    <th>Revenue</th>
  </tr>
    <td>Employee 2</td>
    <td>$10</td>
  </tr>
  <tr>
    <td>Employee 1</td>
    <td>$50</td>
  </tr>
</table>

Notice that the employee order may be random between the tables.

How can I use nokogiri to create a json file that has each employee as an object, with their total hours and revenue?

Currently, I'm able to just get the individual table cells with some xpath. For example:

puts page.xpath(".//*[@id='UC255_tblSummary']/tbody/tr[2]/td[1]/text()").inner_text

Edit:

Using the page-object gem and the link from @Dave_McNulla, I tried this piece of code just to see what I get:

class MyPage
  include PageObject

  table(:report, :id => 'UC255_tblSummary')

  def get_some_information
    report_element[1][2].text
  end
end

puts get_some_information

Nothing's being returned, however.

Data: https://gist.github.com/anonymous/d8cc0524160d7d03d37b

There's a duplicate of the hours table. The first one is fine. The other table needed is the accessory revenue table. (I'll also need the activations table, but I'll try to merge that from the code that merges the hours and accessory revenue tables.

calf
  • 861
  • 2
  • 11
  • 23
  • 1
    Can you modify the HTML to include classes for the table elements? Specifically, the tags could have an "employee" class and each tag could have a class for what it is (e.g. "name", "revenue", etc.) This would help you match an employee name, then find it in the other HTML document, then either build JSON from the two, or merge them together before building your object. – CDub Mar 19 '13 at 20:10
  • The HTML is not mine. Getting the xpath node is not the issue...I guess I'm more stuck on the Ruby part. I'm not sure how to iterate thought the rows and to merge the data between the two tables. – calf Mar 19 '13 at 20:18
  • Is there a reason you want to use Nokogiri over using Watir? – Justin Ko Mar 19 '13 at 21:38
  • Cheezy/Jeff Morgan has a way in page objects to get info out of a table: http://www.cheezyworld.com/2012/05/23/a-better-shovel/ – Dave McNulla Mar 20 '13 at 00:47
  • @JustinKo, Can Watir do what I need done? I couldn't find anything in the docs. – calf Mar 20 '13 at 02:43
  • In the edited example you have above you are simply calling the get_some_information method directly. The method exists on a class so you need to call it on a method like this - @the_page.get_some_information. – Cheezy Mar 27 '13 at 15:50

1 Answers1

5

I think the general approach is:

  1. Create a hash for each table where the key is the employee
  2. Merge the results from both tables together
  3. Convert to JSON

Create a hash for each table where the key is the employee

This part you can do in Watir or Nokogiri. It only makes sense to use Nokogiri if Watir is giving poor performance due large tables.

Watir:

#I assume you would have a better way to identify the tables than by index
hours_table = browser.table(:index, 0)
wage_table = browser.table(:index, 1)

#Turn the tables into a hash
employee_hours = {}
hours_table.trs.drop(1).each do |tr| 
    tds = tr.tds
    employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}     
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.trs.drop(1).each do |tr| 
    tds = tr.tds
    employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}   
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

Nokogiri:

page = Nokogiri::HTML.parse(browser.html)

hours_table = page.search('table')[0]
wage_table = page.search('table')[1]

employee_hours = {}
hours_table.search('tr').drop(1).each do |tr| 
    tds = tr.search('td')
    employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}     
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.search('tr').drop(1).each do |tr| 
    tds = tr.search('td')
    employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}   
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

Merge the results from both tables together

You want to merge the two hashes together so that for a specific employee, the hash will include their hours as well as their revenue.

employee = employee_hours.merge(employee_wage){ |key, old, new| new.merge(old) }
#=> {"Employee 1"=>{"Revenue"=>"$50", "Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Revenue"=>"$10", "Reg Hours"=>"5", "OT Hours"=>"10"}}

Convert to JSON

Based on this previous question, you can then convert the hash to json.

require 'json'
employee.to_json
Community
  • 1
  • 1
Justin Ko
  • 46,526
  • 5
  • 91
  • 101
  • Thanks for your reply. I made the following change: hours_table = browser.tr(:xpath, '//*[@id="tblReportItems"]/tbody/tr[1]') because it looks like the table ids have now disappeared. Getting an "undefined method 'hashes' error" now, though, from water-webdriver. – calf Mar 20 '13 at 17:29
  • `hashes` is defined for tables, not trs. In other words, you need to ensure that `hours_table` is still a table. Based on your suggested change, you should be able to do `hours_table = browser.table(:id => 'tblReportItems')`. – Justin Ko Mar 20 '13 at 17:37
  • I see...but that table ID is for an all-encompassing table. The tables I need are within that table (with no IDs). Trying your nokogiri example, I get an undefined method error for 'text'. – calf Mar 20 '13 at 17:44
  • Did you change `hours_table` and `wage_table` to point to the correct table? Both solutions are hard-coded to the first/second table on the page. You need to update them to handle your specific table (since I do not know what your page's html looks like). – Justin Ko Mar 20 '13 at 17:54
  • I did; the tables I needed were the fifth and sixth, respectively. I'm still getting the "no method" error for 'text', though. – calf Mar 20 '13 at 20:19
  • My only guess is that you have a row that does not have the same number of columns. Unfortunately, it is difficult to debug without the exact table. – Justin Ko Mar 20 '13 at 20:36
  • I found a view that's giving me ids for the tables now. I tried the Watir example (substituting id for index, followed by the table id) and I'm getting an error ("uninitialized constant table-id-name"). How can I point to id instead of indices for nokogiri? – calf Mar 20 '13 at 20:42
  • Link to html is in the question. Sorry for its sloppiness. – calf Mar 20 '13 at 21:05
  • For your actual page, you can get the hours table by id using `hours_table = browser.table(:id, 'UC255_tblSummary')` in Watir or `hours_table = page.at('table#UC255_tblSummary')` in Nokogiri. I did not see the wage table anywhere. – Justin Ko Mar 21 '13 at 16:19
  • It's actually an accessory sales table (I believe UC252_tblSummary). Getting an undefined method error for search here: hours_table.search – calf Mar 21 '13 at 16:41
  • It's working now! Thanks! However, at the risk of wearing out my welcome, the tables are not being merged. – calf Mar 21 '13 at 16:52
  • I assume by not being merged, you mean that the values of each employee are not being merged together? The code assumes that the values in the Employee Name columns will match. This is not true for your tables - which shows first/last name as well as last/first name. You need to modify the hash keys to have the same format. – Justin Ko Mar 21 '13 at 16:57
  • I made a feeble attempt with: `mappings = { 'employee_wage[ tds[0].text ]' => 'employee_hours[ tds[0].text ]' } Hash[employee_wage.map {|k, v| [mappings[k], v] }]` and I can't seem to get it to work. – calf Mar 21 '13 at 17:29
  • I do not understand what you tried to do. You need to manipulate the string returned by `tds[0].text`, so that the two tables will return the employee name in the same format. – Justin Ko Mar 21 '13 at 17:36
  • I finally figured out how to manipulate the hash keys to match. Thank you. See here: https://gist.github.com/anonymous/bcfcb485d202a226464e. How would I not include the last row of each table? And, to merge a third table, would I simply take the result of the first merge and merge again? Finally, is there a standard way to include all key-value pairs, even if empty, for each object (employee) in the json? – calf Mar 22 '13 at 14:55
  • To ignore the first and last row, you would have to convert the collection to an array. Instead of doing `.drop(1)`, try `.to_a[1..-2]`. For the third table, yes you should be able to merge it the same way. I have not used the json gem, so you will have to check with someone else on that. Alternatively, you could check the final merged hash and add any missing pairs. – Justin Ko Mar 22 '13 at 16:10
  • Thanks Justin, the array worked beautifully to remove he first and last rows (0 and -1). I was also able to successfully merge the third hash. No, I'm working on trying to add the empty pairs to the final hash. I really appreciate your help. – calf Mar 22 '13 at 20:44