I have a folder structure like this example
Groups
- apple
-- ahen45.html
-- rev34.html
-- ......
- bat
-- fsf.html
-- ere.html
--....
...
Groups is parent folded. apple,bat etc sub folders
like this more than 500 hundred sub folders and more than 20000 html files there. Now im trying to read those html file through php and separate title , meta keywords, body and the sub folder as category.
<?php
$file =$_SERVER["DOCUMENT_ROOT"];
$dir = new RecursiveDirectoryIterator('groups/',
FilesystemIterator::SKIP_DOTS);
$it = new RecursiveIteratorIterator($dir,
RecursiveIteratorIterator::SELF_FIRST);
$it->setMaxDepth(1);
foreach ($it as $fileinfo) {
if ($fileinfo->isDir()) {
echo $category = $fileinfo->getFilename();
}
else if ($fileinfo->isFile()) {
$fileinfo->getFilename();
$myURL = $file.'/group/groups/'.$category.'/'.$fileinfo->getFilename();
$doc = new DOMDocument();
$doc->loadHTMLFile($myURL);
$elements = $doc->getElementsByTagName('meta');
$elements = $doc->getElementsByTagName('title');
$elements = $doc->getElementsByTagName('body');
foreach ($elements as $el) {
echo $el->nodeValue, PHP_EOL;
}
}
}
?>
When I try like this it is checking whole page and give warning like tag(other tags like or ) is unclosed. what can I do to work perfectly?