Highest Voted 'xpdf' Questions

31

votes

4 answers

Extract TOC of PDF?

I am extracting a pdf into images / swf and text with the help of SWFTools and XPDF.. I am running these in a PDF script. But now I am trying to go one step further and try to get the TOC from the PDF is it possible to extract this information?

php pdf xpdf

asked Mar 12 '10 at 08:50

Chris

8,168
8
36
51

9

votes

6 answers

How to extract images from a PDF in their original format

I'm using pdfimages -j bar.pdf /tmp/image to extract images from a PDF. My objective is to get them in their raw state as they were added. So If it was a .tif I'd like to get a .tif, if it's a jpg I'd like to get a .jpg. I keep getting .ppm for…

php pdf xpdf

asked Jan 25 '13 at 13:04

Ben

60,438
111
314
488

8

votes

1 answer

is MuPdf library faster than xpdf/poppler at rendering images from pdf pages?

Is MuPdf library faster at rendering images from pdf pages than xpdf/poppler? They say it is high-performance.

performance rendering poppler xpdf mupdf

asked Sep 06 '11 at 15:58

P5music

3,197
2
32
81

8

votes

1 answer

pdftoppm "No display font" errors

I'm using pdftoppm to extract pages from a pdf file, so I can later convert the resulting pbm files into multi-page tiffs with ImageMagick. I've got it to work using the following code: os.system('pdftoppm -f %i -l %i -aa no -mono -q "%s" %sx' %…

python python-2.7 windows-7 xpdf pdftoppm

asked Jun 09 '14 at 19:53

CCKx

1,303
10
22

5

votes

0 answers

pdftotext get font information (font-family, style, size)

I'm using "pdftotext -bbox file.pdf" to convert a pdf file into HTML. Here's a sample line from the output: foo Is there a way to get font information for every word…

text-extraction pdftotext poppler pdf-scraping xpdf

asked May 06 '18 at 11:23

James Kroning

61
5

5

votes

1 answer

How to execute xpdf (pdftotext.exe) on shared drive?

im trying to parse pdf to text via PHP and XPDF (pdftotext.exe). On my localhost everythings works well, but when im trying to move everything on server, im getting into troubles. First of all i checked some settings on server and safe_mode is off,…

php cmd exec pdftotext xpdf

asked Jan 28 '16 at 14:04

Luboš Suk

1,526
14
38

4

votes

4 answers

Editing PDF with XPDF (or with something else)

I would like to ask if it is possible to edit PDF files using the xpdf library and if yes how? I guess this is possible but i could not find any tutorial nor documentation for xpdf so i have realy no idea :( . I'm also open for using another library…

c++ pdf editing xpdf

asked Jan 19 '10 at 14:47

Marek Szanyi

2,348
2
22
28

4

votes

2 answers

PHP Explode with an Unicode character as separator

XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc: eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop)); This Unicode symbol is encoding…

php unicode explode pdftotext xpdf

asked Sep 02 '12 at 09:36

sluijs

4,146
4
30
36

3

votes

2 answers

BASH script to check PDF's are ocr'd

Don't know where to start on this really I have a linux server with over 8000 PDf's and need to know which PDF's have been ocr'd and which one's haven't. Was thinking some sort of script calling XPDF to check the pdf but to be honest not sure if…

linux bash pdf xpdf

asked Nov 03 '11 at 15:20

Grimlockz

2,541
7
31
38

3

votes

2 answers

how to get specified text pos through xpdf or mupdf?

I want to extract some specified text in pdf files and the text position. I know xpdf and mupdf can parse pdf files,so i think they may help me to fulfill this task. But how to use these two lib to get text position?

pdf text extract mupdf xpdf

asked Sep 22 '11 at 09:38

PDF1001

173
3
13

3

votes

2 answers

Tesseract "Error in pixCreateNoInit: pix_malloc fail for data"

trying to run this function within a function based loosely off of this, however, since xPDF can convert PDFs to PNGs, I skipped the ImageMagick conversion step, as well as the faulty logic with the function(i) process, since pdftopng requires a…

r imagemagick ocr tesseract xpdf

asked Nov 03 '17 at 20:53

user5509289

3

votes

3 answers

Fastest PDF->text library for .NET project

I'm trying to create an application which will be basically a catalogue of my PDF collection. We are talking about 15-20GBs containing tens of thousands of PDFs. I am also planning to include a full-text search mechanism. I will be using Lucene.NET…

c# pdf itext pdfbox xpdf

asked Jul 22 '10 at 10:29

n0e

309
3
12

3

votes

2 answers

PDF: extracted images are sliced / tiled

Image extraction with pdfimages and mupdf/mutool works fine so far. Images in PDFs produced with FreePDF are always sliced, so one image results in multiple image files. Is there a trick to avoid this? How can I use the results of pdfshow? Are there…

image pdf ghostscript mupdf xpdf

asked Jan 19 '15 at 11:13

Juergen

73
1
5

3

votes

0 answers

Import XPdf into Visual Studio

I've been trying to figure out how to import the source from xpdf (http://www.foolabs.com/xpdf/download.html) into Visual Studio Express 2013 so that I can utilize the pdftotext function. Could someone run me through the steps required?

import visual-studio-2013 project xpdf

asked Apr 21 '14 at 12:13

user3485043

31
1

3

votes

2 answers

Cropping PDF using BoundingBox/CropBox in Postscript

I would like to know what the actual difference between BoundingBox and CropBox in a Postscript file is. I want to crop a PDF file and display only the cropped part of it as another PDF file. I converted the PDF file to postscript using pdftops from…

pdf postscript xpdf

asked Jul 11 '12 at 21:50

user1512781

73
3
8

Questions tagged [xpdf]