3

I am using Windows software to organize a tourpool. This program creates (among other things) HTML pages with rankings of participants. But these HTML pages are quite hideous, so I am building a site around it.

To show the top 10 ranking I need to select the first 10 out of about 1000 participants of the generated HTML file and put it on my own site.

To do this, I used:

// get top 10 ranks of p_rank.html
$file_contents = file_get_contents('p_rnk.htm');
$start = strpos($file_contents, '<tr class="header">'); 

// get end  
$i = 11;
while (strpos($file_contents, '<tr><td class="position">'. $i .'</td>', $start) === false){
   $i++;
}

$end = strpos($file_contents, '<td class="position">'. $i .'</td>', $start);

$code = substr($file_contents, $start, $end); 
echo $code;

This way I get it to work, only the last 3 columns (previous position, up or down and details) are useless information. So I want these columns deleted or find a way to only select and display the first 4.

How do i manage this?


EDIT

I adjusted my code and at the end I only echo the adjusted table.

<?php

$DOM = new DOMDocument;
$DOM->loadHTMLFile("p_rnk.htm");

$table = $DOM->getElementsByTagName('table')->item(0);
$rows = $table->getElementsByTagName('tr');

$cut_rows_after = 10;
$cut_colomns_after = 3;

$row_index = $rows->length-1;

while($row = $rows->item($row_index)) {
    if($row_index+1 > $cut_rows_after)
        $table->removeChild($row);
    else {
        $tds = $row->getElementsByTagName('td');
        $colomn_index = $tds->length-1;
        while($td = $tds->item($colomn_index)) {
            if($colomn_index+1 > $cut_colomns_after)
                $row->removeChild($td);
            $colomn_index--;
        }
    }
    $row_index--;
}

echo $DOM->saveHTML($table);

?>
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Stephan Tips
  • 37
  • 2
  • 9
  • 1
    Welcome to StackOverflow. Please specify what software do you use; clarify what top10 means here; provide some code that you have tried so that we can see more clearly what the problem is. – YakovL Jun 13 '16 at 15:41
  • 1
    Thanks YakovL. I adjusted my post. Do you need more information? – Stephan Tips Jun 13 '16 at 16:40
  • Yes, great, now the question is quite clear. – YakovL Jun 13 '16 at 23:20
  • YakovL, I've updated my post. Can you point me in the right direction? – Stephan Tips Jun 15 '16 at 07:50
  • Actually, you're close. You may either first remove all unnecessary data from the table (now, it seems, you remove only some `td`s) and than print it or print it cell-wise (get a `tr` element, get all the `td` elements from it, print those which are needed; remember to add, ``, `<\table>`, `` and `` tags in this case).
    – YakovL Jun 15 '16 at 16:00
  • Ok, I've added a tested code to my answer, give it a try. – YakovL Jun 15 '16 at 17:18
  • Thanks @Yakovl !! Almost there.. Been at it for 3,5 hours now, but I can't seem to get it right. Unfortunately, the page doesn't contain a tbody. Therefor I selected all tables and selected table 3 to be edited. The removing td part is working like a charm, but it still shows me all 3 tables. I tried to adjust this getting the body tag and remove the table childs 1 en 2, but I can't get it right... Can you point me in the right direct one last time? I am so close! Many thanks! (I adjusted the code above) – Stephan Tips Jun 15 '16 at 23:09
  • Hi Stephan. If the page doesn't contain `tbody`, use the container which is present. For instance, if `tr` elements are inside a `table` element, use `$DOM->getElementsByTagName('table')` instead. And if you have multiple tables, than you have either iterate them (if you want to change each one), like I iterated `$rows` or just use `$DOM->getElementsByTagName('table')->item($number_of_your_table-1)`. – YakovL Jun 15 '16 at 23:34
  • @YakovL I am sorry to keep bothering you with this.. :( I did use what you described (see my commented code). Problem is that if I use `$body = $DOM->getElementsByTagName('table')->item(2); $tables = $body->getElementsByTagName('table');` It still show all tables. So I need to delete the first 2 tables. But it won't work if I use this for example `for ($i = 0; $i < 2; $i++){ $body->removeChild($tables->item($i)); };` (tables->item($i) doesn't seem to work then? – Stephan Tips Jun 16 '16 at 07:34
  • Hi Stephan, so what's happening if you try the commented code? What error/bad result do you get? `for ($i = 0; $i < 2; $i++){ $body->removeChild($tables->item($i)); };` won't work for sure, but you commented code is smart: you remove `$tables->item(0)` and hence you shouldn't get any mess because of index shifting. By the way don't forget to upvote and accept the answer once we're done :) – YakovL Jun 16 '16 at 10:04
  • If this is done, I'll accept and upvote anything you want!! ;) The error I get is this: `Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error' in E:\Xampp\htdocs\test\index.php:186 Stack trace: #0 E:\Xampp\htdocs\test\index.php(186): DOMNode->removeChild(Object(DOMElement)) #1 {main} thrown in E:\Xampp\htdocs\test\index.php on line 186'. I will update it in my post above. – Stephan Tips Jun 16 '16 at 22:25
  • Ah, ok.. So are `table` elements actually children of `body` or just descendants? Sounds this way. If so, try another way to remove them: for instance, here http://stackoverflow.com/questions/8227481/simple-html-dom-how-to-remove-elements they suggest just `$e->outertext = "";` which would be `$tables->item(0)->outertext = "";` in your case. – YakovL Jun 16 '16 at 23:15
  • That doesn't do anything except make the error dissapear. Still both all 3 tables are still visible. I realise that my 3rd table is in my 2nd table. Is that a problem or should it just manage to pull them apart? – Stephan Tips Jun 17 '16 at 18:18
  • May be you should show the html (or parts of it) in this case. Obviously, in this case you can't just remove table 1 and table 2 since that will remove table 3 as well. – YakovL Jun 17 '16 at 18:48
  • Came to the same conclusion :) I posted the HTML above. I only need the 3rd table. – Stephan Tips Jun 17 '16 at 19:13
  • GOT IT! I don't have to delete the other tables. I can just echo the adjusted table with `echo $DOM->saveHTML($table);` Thanks for all your help YakovL. – Stephan Tips Jun 17 '16 at 19:33
  • Ah, great :) you can also upvote the answer, it just adds some reputation points to me – YakovL Jun 18 '16 at 05:09
  • Can't. Need to have 15 reputation myself or something before upvotes are shown??? – Stephan Tips Jun 18 '16 at 07:52
  • Ah, I see. Nevermind) – YakovL Jun 18 '16 at 20:12

1 Answers1

0

I'd say that the best way to deal with such stuff is to parse the html document (see, for instance, the first anwser here) and then manipulate the object that describes DOM. This way, you can easily extract the table itself using various selectors, get your 10 first records in a simpler manner and also will be able to remove unnecessary child (td) nodes from each line (using removeChild). When you're done with modifying, dump the resulting HTML using saveHTML.

Update:

ok, here's a tested code. I removed the necessity to hardcode the numbers of colomns and rows and separated the desired numbers of colomns and rows into a couple of variables (so that you can adjust them if neede). Give the code a closer look: you'll notice some details which were missing in you code (index is 0..999, not 1..1000, that's why all those -1s and +1s appear; it's better to decrease the index instead of increasing because in this case you don't have to case about numeration shifts on removing; I've also used while instead of for not to care about cases of $rows->item($row_index) == null separately):

<?php
    $DOM = new DOMDocument;
    $DOM->loadHTMLFile("./table.html");

    $table = $DOM->getElementsByTagName('tbody')->item(0);
    $rows = $table->getElementsByTagName('tr');

    $cut_rows_after = 10;
    $cut_colomns_after = 4;

    $row_index = $rows->length-1;
    while($row = $rows->item($row_index)) {
        if($row_index+1 > $cut_rows_after)
            $table->removeChild($row);
        else {
            $tds = $row->getElementsByTagName('td');
            $colomn_index = $tds->length-1;
            while($td = $tds->item($colomn_index)) {
                if($colomn_index+1 > $cut_colomns_after)
                    $row->removeChild($td);
                $colomn_index--;
            }
        }
        $row_index--;
    }

    echo $DOM->saveHTML();
?>

Update 2:

If the page doesn't contain tbody, use the container which is present. For instance, if tr elements are inside a table element, use $DOM->getElementsByTagName('table') instead of $DOM->getElementsByTagName('tbody').

Community
  • 1
  • 1
YakovL
  • 7,557
  • 12
  • 62
  • 102