0

I want to read the text from a .docx file line by line and keep the each line data in an array, since .docx is a zipped file i want to convert it into a .doc file so that I can read the file using @fopen($filename, 'r');.

Below is the code I tried using PHPWord to conver from .docx to .doc ,

<?php
require_once 'phpWord/PHPWord.php';

$PHPWord = new PHPWord();

$document = $PHPWord->loadTemplate('BasicTable.docx');

// Save File

$objWriter = PHPWord_IOFactory::createWriter($PHPWord, 'Word2007');

$objWriter->save('BasicTable4.doc');
?>

and this creates an erroneous .doc file.

Madhu
  • 2,643
  • 6
  • 19
  • 34
  • Reading a `.doc` file using `fopen` and then reading line by line will probably not do want you want. – Eborbob Jun 26 '15 at 09:40
  • @Eborbob if it's a .doc file then i can explode the data for every line break. – Madhu Jun 26 '15 at 09:46
  • If it's a Word document you'll also have a lot of metadata in there, but it sounds like in your use case you have this covered. – Eborbob Jun 26 '15 at 09:56
  • @Eborbob okay then what wud be the solution for this... can i directly read through .docx file using phpWord? – Madhu Jun 26 '15 at 10:01
  • I've not used PHPWord but I think it can only create files, not read them. If you have MS Word installed on your PHP server you can use it via COM to read / write documents. Also see http://stackoverflow.com/questions/188452/reading-writing-a-ms-word-file-in-php for other methods. – Eborbob Jun 26 '15 at 10:07
  • 1
    can't tell for sure but .doc is format is for pre 2007 ms word. from 2007 onwards .docx extension is used. – Sp0T Jun 26 '15 at 11:32

2 Answers2

0

You can try to use PHPWord. ( http://phpword.codeplex.com/ )

It supports docx as well as doc.

Astinox
  • 154
  • 1
  • 10
  • I could not convert .docx file into .doc file with PHPWord. i have given the codes above, plz check. – Madhu Jun 26 '15 at 09:57
  • 1
    @Astinox are you sure that PhpWord supports both docx and doc? As far as I know it doesn't support writing a .doc file (?). And I assume that the more up to date project link would be https://github.com/PHPOffice/PHPWord (the Git version is the currently active version that was started from the codeplex version) – ejuhjav Jun 26 '15 at 12:10
0

If you want to get the text out of .docx file and save it as a text file then you can use the library docx2text

after converting it to text file you can read the text file line by line and keep each line data in an array.

Jayanth Suvarna
  • 187
  • 2
  • 9