0

I've crawled this and other websites and found no solutions to this: I'm trying to read the text from a .doc file using PHPOffice/PHPWord and all the code I've tried has failed. I can read .docx files just fine, it's just 97-03 Word documents that are giving me trouble.

An alternative would be to convert them to .docx or .pdf and read that (this has to be done automatically with no user intervention), but I've found no answer for that either.

function convertDocToDocx($docPath, $docxPath)

$phpWord = new \\PhpOffice\\PhpWord\\PhpWord();

// Load the .doc file
$docReader = \PhpOffice\PhpWord\IOFactory::createReader('Word');
$phpWord = $docReader->load($docPath);

// Save the document as .docx
$docxWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$docxWriter->save($docxPath);

}

function extractTextFromDoc($filepath)
{
$objReader = IOFactory::createReader('Word97');
$phpWord = $objReader-\>load($filepath);
$text = '';

foreach ($phpWord->getSections() as $section) {
    foreach ($section->getElements() as $element) {
        if ($element instanceof \PhpOffice\PhpWord\Element\Text) {
            $text .= $element->getText();
        }
    }
}

return $text;

}
  • I don't think PHPWord supports the old binary doc format. You could try using LibreOffice to convert via command line. [Here is an old related question](https://stackoverflow.com/q/5671988/1191247). – user1191247 Jul 10 '23 at 11:13

0 Answers0