2

One of my projects requires the conversion of DOCX to PDF. I came across the phpdocx project and everything converts fine, but it serves the file to the browser, which prompts the user to download after conversion. I need to keep the file, just read the data for MySQL storage. Any ideas?

Here is the code I'm using:

$docx = new TransformDoc();
$docx ->setStrFile($tmpName);
$docx ->generatePDF();

Using Tim's modifications below produces the following error:

i
Warning: session_start() [function.session-start]:
Cannot send session cache limiter - headers already sent
(output started at /home/zbtech/public_html/DocCon/classes/TransformDoc.inc:1)
in /home/zbtech/public_html/scribpub.php on line 5

Unable to generate PDF file string. exception 'DOMPDF_Exception' with message
'Unknown image type: files/files_/tmp/phpNQFatu/media/word/.'

in /home/zbtech/public_html/DocCon/pdf/include/image_cache.cls.php:175 Stack trace:

#0 /home/zbtech/public_html/DocCon/pdf/include/image_frame_decorator.cls.php(88): 
Image_Cache::resolve_url('files/files_/tm...', NULL, '', '')

#1 /home/zbtech/public_html/DocCon/pdf/include/frame_factory.cls.php(173): Image_Frame_Decorator-
>__construct(Object(Frame), Object(DOMPDF)) 

#2 /home/zbtech/public_html/DocCon/pdf/include/dompdf.cls.php(499): Frame_Factory::decorate_frame
(Object(Frame), Object(DOMPDF)) #3 /home/zbtech/public_html/DocCon/classes/TransformDoc.inc
(282): DOMPDF->render() #4 /home/zbtech/public_html/scribpub.php(68): TransformDoc->generatePDF

() #5 {main}
Tim G
  • 1,812
  • 12
  • 25
Zac Brown
  • 5,905
  • 19
  • 59
  • 107
  • Look at the TransformDoc class and find a way to assign the binary data instead of outputting it. – Mike B Sep 17 '12 at 14:57
  • Have not played with phpdocx.. Anyway $output=$docx->generatePDF() would work? – Moe Tsao Sep 17 '12 at 14:57
  • possible duplicate of [download the text file instead of opening in the browser](http://stackoverflow.com/questions/6921953/download-the-text-file-instead-of-opening-in-the-browser) – Toby Allen Sep 17 '12 at 14:57
  • I know the duplicate is for a text file, but its the same issue, you need to add the correct headers in your php page – Toby Allen Sep 17 '12 at 14:57
  • I'm not sure it's a header issue, Toby. The phpdocx class converts the doc and returns it to the browser. @Moe, if only it was that easy! That is the first thing I tried, but it is still returned to the browser in the class. Mike, I have. Just can't seem to find it. – Zac Brown Sep 17 '12 at 15:00

3 Answers3

4

Here's what I'd do.

The phpdocx library does this to stream the pdf to the browser.

This is located in classes/TransformDoc.inc on or about line 275 (as of the version I downloaded on 9/17/2012)

public function generatePDF()
{
    $this->generateXHTML();
    $this->cleanXHTML();
    try {
        $domPDF = new DOMPDF();
        $domPDF->load_html($this->_xhtml);
        $domPDF->render();
        $fileName = $this->getFileName() . '.pdf';
        $domPDF->stream($fileName);
    }
    catch (Exception $err) {
        echo 'Unable to generate PDF file. ';
        echo $err;
    }
}

Looking at the source reveals that you could write your own function to do something similar. Here is an untested, example function based on the above function.

/**
 * Convert DOCX to PDF, using dompdf. DOCX->XHTML->PDF and returns in a string
 *
 * @access public
 */
public function generatePDF()
{
    $this->generateXHTML();
    $this->cleanXHTML();
    try {
        $domPDF = new DOMPDF();
        $domPDF->load_html($this->_xhtml);
        $domPDF->render();
        $out = $domPDF->output();
    }
    catch (Exception $err) {
        echo 'Unable to generate PDF file string. ';
        echo $err;
    }

    return $out;
}
Tim G
  • 1,812
  • 12
  • 25
  • Oh, DUH! I have looked at that source over and over again for a couple hours now! Thanks, @Tim!! +1 for you! – Zac Brown Sep 17 '12 at 15:14
  • I just did 2 quick edits - make sure, after reading this comment, that you grab the most up to date code. – Tim G Sep 17 '12 at 15:15
  • I had to trace the code into pdf/include/dompdf.cls.php to find the *output* method. It does not take a filename but it does take options - phpdocx does not send options, but there apparently are options that could be sent if you wanted to. – Tim G Sep 17 '12 at 15:17
  • also, whoever wrote the phpdocx code might not understand exception handling very well - it seems a bit pointless to just echo the exception here - it would be best to let it bubble up out of the library to the application to be handled. :/ oh well. – Tim G Sep 17 '12 at 15:18
  • Using those changes only produces and error for me. I went back and grabbed your updated source. Should I post the error here, or start a new question with your modifications and the error? – Zac Brown Sep 17 '12 at 15:55
  • Great! You seem like you know what you are talking about! I'll append the error to the original post. – Zac Brown Sep 17 '12 at 16:28
  • what is the stray "i" ? looks like you may have added a character on line 1 of file: /home/zbtech/public_html/DocCon/classes/TransformDoc.inc – Tim G Sep 17 '12 at 16:37
  • Yup! It was. I fixed it after I noticed it in the updated post. – Zac Brown Sep 17 '12 at 16:41
2

If you search the files for PHP Docx for "Unknown image type:" you'll find it in pdf/include/image_cache.cls.php.

// line 81
static function resolve_url($url, $proto, $host, $base_path) {
...
// line 168
$resolved_url = build_url($proto, $host, $base_path, $url);
if ($DEBUGPNG) print 'build_url('.$proto.','.$host.','.$base_path.','.$url.')('.$resolved_url.')';

if ( !preg_match("/.*\.(\w+)/",$url,$match) ) {
    //debugpng
    if ($DEBUGPNG) print '[resolve_url exception '.$url.']';
      throw new DOMPDF_Exception("Unknown image type: $url.");
    }
    ....

the code is throwing this error because it can't find an extension on the url to guess the image type from. I have no idea why this is happening with the new code that utilizes the ->output method and not with the original code - you'd think that generating a pdf would work no matter which way we use.

Two choices now: comment out (or remove) the throw new DOMPDF_Exception line referenced above, or using output buffering with the original function.

Tim G
  • 1,812
  • 12
  • 25
  • it's also possible that the docx file references an external image that's not getting uploaded. I'm a bit out of my depth here in my understanding of docx format. – Tim G Sep 17 '12 at 16:58
  • DOCX is pretty much just like a zip. You can even open DOCX documents in zip applications. I've verified that everything is being uploaded. – Zac Brown Sep 17 '12 at 17:04
  • you could look at the DOMPDF project http://code.google.com/p/dompdf/ and see if you could upgrade the code inside PHP Docx to the latest version - this might be a known bug that has been fixed. – Tim G Sep 17 '12 at 17:16
  • http://code.google.com/p/dompdf/issues/detail?id=393&can=1&q=unknown%20image&colspec=ID%20Type%20Status%20Stars%20Priority%20Milestone%20Owner%20Summary%20Modified – Tim G Sep 17 '12 at 17:17
  • 1
    Got it sorted! You are a brilliant man. Mind if I place a credit on the site home page, linked to your site, Tim? – Zac Brown Sep 19 '12 at 23:12
  • Tim, I added the credit to the hom epage, at the bottom of the social box on the right. (http://www.scribbler.me) We'll leave it there for a while, maybe send some traffic your way! Again, thanks for all of your help! – Zac Brown Sep 20 '12 at 19:26
0

A clear way to achive this is to editing the provided file:

/phpdocx/examples/easy/createPDF.php

public function generatePDF($outputFileName)
{
    $this->generateXHTML();
    $this->cleanXHTML();
    try {
        $domPDF = new DOMPDF();
        $domPDF->load_html($this->_xhtml);
        $domPDF->render();
        //
        // ADD THIS: (dont forget the outputFileName method argument)
        //
        $handler = fopen($outputFileName,'w');
        fwrite($handler, $domPDF->output());
        fclose($handler);
    }
    catch (Exception $err) {
        echo 'Unable to generate PDF file string. ';
        echo $err;
    }

    return $out;
}
christian
  • 538
  • 5
  • 8