8

I have a FlipBook jquery page and too many ebooks(pdf format) to display on it. I need to keep these PDF's hidden so that I would like to get its content with PHP and display it with my FlipBook jquery page. (instead of giving whole pdf I would like to give it as parts).

Is there any way i can get whole content of PDF file with PHP? I need to seperate them according to their pages.

Umair Shah
  • 2,305
  • 2
  • 25
  • 50
Berk Kaya
  • 450
  • 2
  • 9
  • 18

1 Answers1

13

You can use PDF Parser (PHP PDF Library) to extract each and everything from PDF's.

PDF Parser Library Link: https://github.com/smalot/pdfparser

Online Demo Link: https://github.com/smalot/pdfparser/blob/master/doc/Usage.md

Documentation Link: https://github.com/smalot/pdfparser/tree/master/doc

Sample Code:

<?php
 
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
 
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile('document.pdf');
 
$text = $pdf->getText();
echo $text;
 
?>

Regarding another part of your Question:

How To Convert Your PDF Pages Into Images:

You need ImageMagick and GhostScript

<?php
$im = new imagick('file.pdf[0]');
$im->setImageFormat('jpg');
header('Content-Type: image/jpeg');
echo $im;
?>

The [0] means page 1.

Umair Shah
  • 2,305
  • 2
  • 25
  • 50
  • 1
    PDF Parser for only getting text. I also need to get images in PDF. – Berk Kaya Apr 23 '16 at 10:40
  • Thanks for the answer but rendering PDF to images is not very effective solution. It have to be readable on mobile and images looks bad on mobile. – Berk Kaya Apr 23 '16 at 11:07
  • 1
    @BerkKaya : Seems like then may be you will need to target specific parts of the pages to just take the images but I don't think so if that is possible dynamically..! If it solved your question so please mark the answer as accepted. – Umair Shah Apr 23 '16 at 11:15
  • @UmairShahYousafzai : how to regenerate pdf after parsing in the same format. i am able to parse but stuck how to regenerate? – Kausha Thakkar Feb 13 '17 at 06:36
  • 1
    @KaushaThakkar : Regenerating PDF back from simple text won't be possible as during parsing the wysiwug pdf version turns into simple text...Only it would be possible if you try to parse the PDF into Markup text and then you can reconstruct your PDF using the same Markup text..! – Umair Shah Feb 13 '17 at 11:58
  • @KaushaThakkar : Instead you would need to turn your PDF into something which can hold it's true form like DOCX etc and then you can reconstruct the PDF from your docx...Take a look at : http://www.zamzar.com/ – Umair Shah Feb 13 '17 at 12:07
  • @UmairShahYousafzai : Thanks for your input. pdftk is exactly doing what i want. but in my case, the form field value is not what i want to replaced? – Kausha Thakkar Feb 13 '17 at 12:29
  • @KaushaThakkar : Do you mean this one?? https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ – Umair Shah Feb 13 '17 at 12:31
  • @UmairShahYousafzai : Yes but that only converts the form field values. and in my case they are not form fields :( – Kausha Thakkar Feb 13 '17 at 12:33
  • @UmairShahYousafzai : i am able to parse the data and modify by using the `$parser = new \Smalot\PdfParser\Parser();` this but i also want to convert it back to pdf – Kausha Thakkar Feb 13 '17 at 12:36
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/135590/discussion-between-kausha-thakkar-and-umair-shah-yousafzai). – Kausha Thakkar Feb 13 '17 at 12:48
  • FYI this PDFparser is quite buggy and has masses of open issues, even in 2020, so always run proper manual tests to make sure you get the expected result! – Sliq Nov 03 '20 at 15:42
  • This library doesn't exist anymore – Simon30 Mar 10 '22 at 15:35
  • 1
    @Simon30 Updated the links. The library official website is not there anymore while the library do still exist on github and composer etc. – Umair Shah Mar 11 '22 at 16:22
  • I managed to do my task using a java library finally but thanks for the update ! – Simon30 Apr 13 '22 at 18:35