0

I wrote some code to find a specific word between two HTML tags, and then copy the names into a text file.

It works on small HTML files. The problem is that it doesn't work on large files (22 MB), (at some point its stops running and no file is created).

How can I make my code more scalable?

<?php
$filename = "example.txt";          // file that the code search for names
$names = fopen("names.txt", 'a+');  // new file to copy the names to it
$handle = fopen($filename, "r");    // copy text from file to variable
while(!feof($handle)){              // loop until the end of the file
$line = fgets($handle);           // reading 1 line from file at a time
preg_match(';(?<=from"><span class="profile fn">)(.*)</span></div>;', $line, $matches);
// searching the name between the html tags, if found --> into $matches
$match = $matches[1];                // the name is at cell 1 in array into @match
if(!empty($match))                   //if @match is not empty (some lines empty if
                                     //preg_match didn't find a match (to avoid 
                                     //empty lines in file) 
{                                           
fwrite($names, ($match."\n"));       //write the name into the file "names"
}
$data1 = file("names.txt");          //create new file without duplicate names
file_put_contents('unique.txt', implode(array_unique($data1))); 
}    
fclose($filename);
fclose($names);
fclose('unique.txt');
?>
jsj
  • 9,019
  • 17
  • 58
  • 103
BenB
  • 2,747
  • 3
  • 29
  • 54
  • 1
    [Never parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Antony Feb 07 '13 at 05:12
  • 1
    Stops running is not very descriptive. Any errors, warnings, exceptions displayed? –  Feb 07 '13 at 05:53
  • @Antony, what you suggest to parse? – BenB Feb 07 '13 at 10:50
  • @bmewsing, the page stpos to load and do not do nothing, no errors, warnings, exceptions displayed – BenB Feb 07 '13 at 10:52
  • I'm not great with regexes but shouldn't you make this bit non-greedy? `(.*)` –  Feb 07 '13 at 11:09
  • @bmewsing, like (.*) this its works. how i change it to non-greedy? – BenB Feb 07 '13 at 11:15
  • @batz, `(.*?)` I believe. –  Feb 07 '13 at 11:36
  • 1
    @batz [Have you tried using an XML parser instead?](http://www.php.net/manual/en/class.domdocument.php) – Antony Feb 07 '13 at 15:54
  • @Antony , i did try with You parsing HTML with DOMDocument, still same problem. probably because its reads every line separably. where can found explain how to use XML parser for what i need? thanks – BenB Feb 08 '13 at 04:28

0 Answers0