Questions tagged [xpdf]

Xpdf is an open-source PDF viewer for the X Window System and Motif.

Xpdf is an open-source PDF viewer for the X Window System and Motif. Xpdf runs on practically any Unix-like operating system. Xpdf can decode LZW and read encrypted PDFs.

More details at http://en.wikipedia.org/wiki/Xpdf

71 questions
31
votes
4 answers

Extract TOC of PDF?

I am extracting a pdf into images / swf and text with the help of SWFTools and XPDF.. I am running these in a PDF script. But now I am trying to go one step further and try to get the TOC from the PDF is it possible to extract this information?
Chris
  • 8,168
  • 8
  • 36
  • 51
9
votes
6 answers

How to extract images from a PDF in their original format

I'm using pdfimages -j bar.pdf /tmp/image to extract images from a PDF. My objective is to get them in their raw state as they were added. So If it was a .tif I'd like to get a .tif, if it's a jpg I'd like to get a .jpg. I keep getting .ppm for…
Ben
  • 60,438
  • 111
  • 314
  • 488
8
votes
1 answer

is MuPdf library faster than xpdf/poppler at rendering images from pdf pages?

Is MuPdf library faster at rendering images from pdf pages than xpdf/poppler? They say it is high-performance.
P5music
  • 3,197
  • 2
  • 32
  • 81
8
votes
1 answer

pdftoppm "No display font" errors

I'm using pdftoppm to extract pages from a pdf file, so I can later convert the resulting pbm files into multi-page tiffs with ImageMagick. I've got it to work using the following code: os.system('pdftoppm -f %i -l %i -aa no -mono -q "%s" %sx' %…
CCKx
  • 1,303
  • 10
  • 22
5
votes
0 answers

pdftotext get font information (font-family, style, size)

I'm using "pdftotext -bbox file.pdf" to convert a pdf file into HTML. Here's a sample line from the output: foo Is there a way to get font information for every word…
5
votes
1 answer

How to execute xpdf (pdftotext.exe) on shared drive?

im trying to parse pdf to text via PHP and XPDF (pdftotext.exe). On my localhost everythings works well, but when im trying to move everything on server, im getting into troubles. First of all i checked some settings on server and safe_mode is off,…
Luboš Suk
  • 1,526
  • 14
  • 38
4
votes
4 answers

Editing PDF with XPDF (or with something else)

I would like to ask if it is possible to edit PDF files using the xpdf library and if yes how? I guess this is possible but i could not find any tutorial nor documentation for xpdf so i have realy no idea :( . I'm also open for using another library…
Marek Szanyi
  • 2,348
  • 2
  • 22
  • 28
4
votes
2 answers

PHP Explode with an Unicode character as separator

XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc: eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop)); This Unicode symbol is encoding…
sluijs
  • 4,146
  • 4
  • 30
  • 36
3
votes
2 answers

BASH script to check PDF's are ocr'd

Don't know where to start on this really I have a linux server with over 8000 PDf's and need to know which PDF's have been ocr'd and which one's haven't. Was thinking some sort of script calling XPDF to check the pdf but to be honest not sure if…
Grimlockz
  • 2,541
  • 7
  • 31
  • 38
3
votes
2 answers

how to get specified text pos through xpdf or mupdf?

I want to extract some specified text in pdf files and the text position. I know xpdf and mupdf can parse pdf files,so i think they may help me to fulfill this task. But how to use these two lib to get text position?
PDF1001
  • 173
  • 3
  • 13
3
votes
2 answers

Tesseract "Error in pixCreateNoInit: pix_malloc fail for data"

trying to run this function within a function based loosely off of this, however, since xPDF can convert PDFs to PNGs, I skipped the ImageMagick conversion step, as well as the faulty logic with the function(i) process, since pdftopng requires a…
user5509289
3
votes
3 answers

Fastest PDF->text library for .NET project

I'm trying to create an application which will be basically a catalogue of my PDF collection. We are talking about 15-20GBs containing tens of thousands of PDFs. I am also planning to include a full-text search mechanism. I will be using Lucene.NET…
n0e
  • 309
  • 3
  • 12
3
votes
2 answers

PDF: extracted images are sliced / tiled

Image extraction with pdfimages and mupdf/mutool works fine so far. Images in PDFs produced with FreePDF are always sliced, so one image results in multiple image files. Is there a trick to avoid this? How can I use the results of pdfshow? Are there…
Juergen
  • 73
  • 1
  • 5
3
votes
0 answers

Import XPdf into Visual Studio

I've been trying to figure out how to import the source from xpdf (http://www.foolabs.com/xpdf/download.html) into Visual Studio Express 2013 so that I can utilize the pdftotext function. Could someone run me through the steps required?
3
votes
2 answers

Cropping PDF using BoundingBox/CropBox in Postscript

I would like to know what the actual difference between BoundingBox and CropBox in a Postscript file is. I want to crop a PDF file and display only the cropped part of it as another PDF file. I converted the PDF file to postscript using pdftops from…
user1512781
  • 73
  • 3
  • 8
1
2 3 4 5