1

Here is the situation. I am using Node.js and cheerio My console log return is fine except for some duplicate stuff cause by the site structure...

My script:

var request = require ('request'),
    cheerio = require('cheerio'),
    chart = [];                              

request('http://www.website-X.com', function(err, resp, body){
    if(!err && resp.statusCode == 200){
        var $ = cheerio.load(body);
                $('tr', '#chart_body').each(function(){
                    var rank = $(this).text().trim().replace(/\s\s+/g, ';');
                    chart.push(rank);
        });
        console.log(chart);                             
    }
});

The site structure (simplified):

                    <table id="chart_body">
                        <tr><!-- 1 Info I need --></td>
                        <tr><!-- 2 Info I need --></td>
                            <table>
                                <tbody>
                                    <tr> Duplicate info as 1  </tr>
                                </tbody>
                            </table>
                        <tr><!-- 3 Info I need --></td>
                        <tr><!-- 4 Info I need --></td>
                        <tr><!-- 5 Info I need --></td>
                        <tr><!-- 6 Info I need --></td>
                    </table>

My console log return:

'1;Wolfenstein;330,703;330,703;1',
  'Wolfenstein',
'2;Wolfenstein;188,200;188,200;1',
  'Wolfenstein',
'3;Minecraft;126,041;215,109;2',
  'Minecraft','

My console log return is fine except for the duplicate stuff . It's cause in the site structure the selector tr has another tr within it. I can't get rid of 'tr tr'. The tr's also don't have unique classes to further select.

Please help. Thanks!!! -Aldo

Oh and lastly... The pesky single quote at the beginning and end of every return. I can't take it out.

Aldo
  • 11
  • 2
  • You might try taking a look at these answers out there, which are trying to do a similar thing: http://stackoverflow.com/questions/11362085/jquery-get-text-for-element-without-children-text http://stackoverflow.com/questions/3442394/jquery-using-text-to-retrieve-only-text-not-nested-in-child-tags – dylants Jun 03 '14 at 18:21

2 Answers2

0

If the tr rows are consistently duplicated, you could only select the odd ones:

$('tr', '#chart_body').odd()

Using the function from this answer.

Community
  • 1
  • 1
dunnkers
  • 21
  • 3
0

How about the child selector? #chart_body > tr should do the job.

fb55
  • 1,197
  • 1
  • 11
  • 16