0

The project consists of a lateral menu, which contains all my blog entries. In theory, the menu works this way: I get the <h1> title of each of my blog documents and his corresponding href, using php.


So each time i add a new blog entry, it appears in the menu automatically, i did this using fopen and fread and saving the contents of each php document of my blog in a variable called $leer.


Then, i used Preg_match_all to search the title of each blog entry and display it on the menu. With this I can make the menu without having to add the links manually, also using scandir get the url.


The problem is that, when using preg_match_all in the array, gives me many incorrect results, obtaining the same <h1> title four times without explanation. Here is the code:

$escanear = scandir ("/home/sites/thetoptenweb.com/public_html/post");
$tamanoArray = count($escanear);

So first, as you can see, i´m using scandir and count to get the number of pages.

for($i=2;$i<=$tamanoArray-3;$i++){
    $abrirFichero = fopen($escanear[$i],"r");
    $leer=fread($abrirFichero, filesize($escanear[$i]));        
    fclose($abrirFichero);
}

Then, i use a for loop and fread to read all my documents. The loop is made to "scan" only the selected files between the second and the last-3, because those are my blog entries.

preg_match_all('%<h1>(.*)</h1>%', $leer, $arrayMaches);

So, with the preg_match_all function i get the Title of my documents in a multi-dimensional array, that array is giving me problems i think, because i tryed to read each result with foreach loops, but there are some blank results and i don´t know why this happens.

foreach ($arrayMaches as $result) {
    foreach ($result as $key) {
    }
}

$cortado=preg_split('%</?h1>%', $key);

echo "<a href=".'"'.$escanear[$i].'"'."><li>".$cortado[0]."</li></a><hr>";

Finally, i used foreach loops to access to the multi-dimensional array of the preg_match_all. Then, i preg_splited the results to get the text without html tags and after that, displayed the results with echo.

Hope someone helps me recognizing the problem and, if possible, an explanation to the preg_match_all array because i don´t understand how it´s created. Suggestions admited. If you know a better way to do this, i´ll happy to read it. This is the entire code:

<?php
    $escanear = scandir ("/home/sites/thetoptenweb.com/public_html/post");
    $tamanoArray = count($escanear);
    for($i=2;$i<=$tamanoArray-3;$i++){
        $abrirFichero = fopen($escanear[$i],"r");
        $leer=fread($abrirFichero, filesize($escanear[$i]));
        fclose($abrirFichero);
        preg_match_all('%<h1>(.*)</h1>%', $leer, $arrayMaches);
        foreach ($arrayMaches as $result) {
            foreach ($result as $key) {
            }
        }
        $cortado=preg_split('%</?h1>%', $key);
        echo "<a href=".'"'.$escanear[$i].'"'."><li>".$cortado[0]."</li></a><hr>";                  
    }                               
?> 

Thanks.

Rishabh Shah
  • 679
  • 7
  • 20
Meru Gr.
  • 1
  • 2
  • Can you output `$leer`? An HTML/XML parser might be an easier way to do this. – chris85 Jun 14 '15 at 19:42
  • yep, the variable `$leer` gives me the entire page, so it´s getting the document correctly. – Meru Gr. Jun 14 '15 at 20:00
  • Could you explain quickly how to extract the title with an HTML parser? because i didn´t learn html parser that yet. – Meru Gr. Jun 14 '15 at 20:07
  • Here's a thread on that, http://stackoverflow.com/questions/3299033/getting-all-values-from-h1-tags-using-php. Your regex is greedy so if you have more than one `h1` on a line it's going to gobble all of that, or if you are using the `s` modifier (current code isn't using this). – chris85 Jun 14 '15 at 20:10
  • i must say that there is only 1 `h1` tag in each document. – Meru Gr. Jun 14 '15 at 20:24

1 Answers1

1

Just use the DOM... you'll save yourself some trouble.

$menuData = array();
$iter = new DirectoryIterator('/home/sites/thetoptenweb.com/public_html/post');

foreach ($iter as $file) {
   if ($file->isFile()) {
       $filename = $file->getFilename();
       $path = $file->getPathname();
       $dom = new DOMDocument();
       $dom->loadHtmlFile($path);

       $titleNode = $dom->getElementsByTagname('h1')->item(0);
       if ($titleNode) {
          $title = $titleNode->nodeValue;
          $menuData[$filename] = $title;
       }
   }
}

Now you have all the stuff in $menuData you can just loop over it and output the links, assuming that the filename is the appropriate URL. Alternatively, you could output the links in the loop directly but it's wiser to separate things. Create a function to get the data you need, and then use that data to output.

But an even better solution would be to pick a blog platform and use that, then spend your time writing an importer and adjusting look and feel to suit.

prodigitalson
  • 60,050
  • 10
  • 100
  • 114
  • So if i `print_r($menuData);`, i get an array like this ` [5cosasvivirsolo.php] =>` with just the filenames. How do i access to the `h1` title?. You said that i can loop over the `$menuData` variable but i don´t know where are the titles. Maybe you mean that i have to loop over the `$title array`? – Meru Gr. Jun 14 '15 at 20:44
  • If the `h1` just contained a basic title that should be in the array... do your `h1` contain additional markup? Also just saw that there was typo and corrected it. – prodigitalson Jun 15 '15 at 15:20