Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
1721
votes
30 answers

How Can I add HTML And CSS Into PDF

I have an HTML (not XHTML) document that renders fine in Firefox 3 and IE 7. It uses fairly basic CSS to style it and renders fine in HTML. I'm now after a way of converting it to PDF. I have tried: DOMPDF: it had huge problems with tables. I…
cletus
  • 616,129
  • 168
  • 910
  • 942
1504
votes
23 answers

Merge / convert multiple PDF files into one PDF

How could I merge / convert multiple PDF files into one large PDF file? I tried the following, but the content of the target file was not as expected: convert file1.pdf file2.pdf merged.pdf I need a very simple/basic command line (CLI) solution.…
alcohol
  • 22,596
  • 4
  • 23
  • 21
1462
votes
3 answers

Proper MIME media type for PDF files

When working with PDFs, I've run across the MIME types application/pdf and application/x-pdf among others. Is there a difference between these two types, and if so what is it? Is one preferred over the other? I'm working on a web app which must…
friedo
  • 65,762
  • 16
  • 114
  • 184
1362
votes
29 answers

Recommended way to embed PDF in HTML?

What is the recommended way to embed PDF in HTML? iFrame? Object? Embed? What does Adobe say itself about it? In my case, the PDF is generated on the fly, so it can't be uploaded to a third-party solution prior to flushing it.
Daniel Silveira
  • 41,125
  • 36
  • 100
  • 121
769
votes
14 answers

ImageMagick security policy 'PDF' blocking conversion

The Imagemagick security policy seems to be not allowing me perform this conversion from pdf to png. Converting other extensions seem to be working, just not from pdf. I haven't changed any of the imagemagick settings since I installed it... I am…
T. Zack Crawford
  • 7,646
  • 3
  • 11
  • 18
681
votes
6 answers

Inserting a PDF file in LaTeX

I am trying to insert a PDF or doc file as an appendix in my LaTeX file. Do you know how I can do this?
Guido
  • 6,853
  • 3
  • 16
  • 9
507
votes
26 answers

Convert HTML to PDF in .NET

I want to generate a PDF by passing HTML contents to a function. I have made use of iTextSharp for this but it does not perform well when it encounters tables and the layout just gets messy. Is there a better way?
SandHurst
416
votes
13 answers

Python module for converting PDF to text

Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.
cnu
  • 36,135
  • 23
  • 65
  • 63
415
votes
24 answers

Convert PDF to image with high resolution

I'm trying to use the command line program convert to take a PDF into an image (JPEG or PNG). Here is one of the PDFs that I'm trying to convert. I want the program to trim off the excess white-space and return a high enough quality image that the…
JBWhitmore
  • 11,576
  • 10
  • 38
  • 52
378
votes
3 answers

Fast and Lean PDF Viewer for iPhone / iPad / iOS - tips and hints?

There has been many Questions recently about drawing PDF's. Yes, you can render PDF's very easily with a UIWebView but this cant give the performance and functionality that you would expect from a good PDF viewer. You can draw a PDF page to a…
Luke Mcneice
  • 3,012
  • 4
  • 38
  • 50
347
votes
34 answers

How to extract text from a PDF file?

I'm trying to extract the text included in this PDF file using Python. I'm using the PyPDF2 package (version 1.27.2), and have the following script: import PyPDF2 with open("sample.pdf", "rb") as pdf_file: read_pdf =…
Simplicity
  • 47,404
  • 98
  • 256
  • 385
317
votes
7 answers

Generating PDF files with JavaScript

I’m trying to convert XML data into PDF files from a web page and I was hoping I could do this entirely within JavaScript. I need to be able to draw text, images and simple shapes. I would love to be able to do this entirely in the browser.
amoeba
  • 4,015
  • 3
  • 21
  • 14
298
votes
14 answers

How do I force files to open in the browser instead of downloading (PDF)?

Is there a way to force PDF files to open in the browser when the option "Display PDF in browser" is unchecked? I tried using the embed tag and an iframe, but it only works when that option is checked. What can I do?
elloalisboa
  • 3,053
  • 2
  • 17
  • 10
295
votes
17 answers

How to display PDF file in HTML?

I have an auto generated PDF file by itext and I need to display that PDF file in HTML. My question is: How to display a local PDF file in HTML using pdf.js? Should that PDF file be generated by some standards?
vivek
  • 4,599
  • 3
  • 25
  • 37
291
votes
2 answers

How can I convert a series of images to a PDF from the command line on Linux?

I have a scanning server I wrote in CGI and Bash. I want to be able to convert a bunch of images (all in one folder) to a PDF from the command line. How can that be done?
Jakob Weisblat
  • 7,450
  • 9
  • 37
  • 65
1
2 3
99 100