Questions tagged [pdf-to-html]

79 questions
20
votes
4 answers

Convert PDF to HTML

What is the best solution to convert PDF documents to be viewed in the browser as HTML? The site has several PDF documents and the visitor can click on view as HTML and this should be viewed on the screen as an HTML file. Standard website running…
ToughPal
  • 2,231
  • 5
  • 26
  • 30
14
votes
9 answers

What is a good PDF to HTML converter for Ruby on Rails?

I'm trying to convert programatically PDF to HTML. So far I've been using pdftohtml but our users are not happy with the results. Here's what I need : I'm using Ruby on Rails, but any tool working on Unix would work as I can call it from the…
marcgg
  • 65,020
  • 52
  • 178
  • 231
12
votes
2 answers

Extract table data from PDF

Is there any consistent way to extract tables from PDF files? Any tools? What I have done so far: I have tried out pdftotext tool. It has an option to convert to HTML layout. What is the problem with this: The table information is not preserved…
Rajneesh
  • 2,185
  • 4
  • 20
  • 30
6
votes
0 answers

Returning formatted text from GCP Vision PDF results

I finally got my script to submit PDF document to Google Storage and then extract Text using Google Vision for PDF, as described in documentation. The data is returned in a huge JSON file. There's one node that contains test, but it's no longer…
santa
  • 12,234
  • 49
  • 155
  • 255
6
votes
2 answers

Convert pdf to a single page editable html

I have been trying to convert a pdf file to a single nice html page.After surfing about it. The solutions I have got are little bit lacking to my requirements.As I have to create individual html pages for say about 200 pdf files.As online converters…
Nagama Inamdar
  • 2,851
  • 22
  • 39
  • 48
4
votes
1 answer

Convert PDF to HTML using python and pdfkit

On this site Adobe write about conversion from pdf to html using pdfkit They use pdfkit.from_pdf(...) method. This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’…
Duzy
  • 79
  • 2
  • 8
4
votes
2 answers

Convert multi pages PDF into single html file using pdftohtml poppler utility

I am converting PDF document into HTML using poppler utility. But its creating separate html file for each page, but I want a single HTML file after converting pdf to html. I used following syntax: pdftohtml -c abc.pdf But its creating abc-1.html,…
Deepti Kakade
  • 3,053
  • 3
  • 19
  • 30
3
votes
3 answers

PDFminer - Is there a way to convert pdf into html from pdfminer?

Is a simple way to convert pdf to html using pdfminer? I have seen many questions like this but they won't give me a right answer... I have entered this in my ConEmu prompt: # pdf2txt.py -o output.html -t html sample.pdf usage: C:\Program…
3
votes
1 answer

PDFDomTree not detecting white spaces while converting a pdf file to html

I am using PDFDomTree with pdfbox-2.0.9 in my java application to convert a pdf file to html file. Following code I have used to convert a pdf. try { PDDocument document = PDDocument.load(new File("some path")); PDFDomTree parser = new…
vsbehere
  • 666
  • 1
  • 7
  • 23
3
votes
1 answer

Pdftohtml Poppler utils not working on centOs

I am trying to convert pdf to html in php using mgufrone library(https://github.com/mgufrone/pdf-to-html). I run this in my mac it's working alright. But when I run in centos server, the .html file is created blank inside…
Mir Mumtaz
  • 109
  • 2
  • 9
3
votes
3 answers

Converting pdf to vector image

I'm trying to use pdf content (mathematics) in my webpage. I basically want to convert the pdf to some vector image. Converting the pdf to swf does the job very well, but as flash isn't supported on every platform, I'm trying to find another…
Kasper
  • 12,594
  • 12
  • 41
  • 63
2
votes
3 answers

Convert PDF file to a single HTML file

I am trying to convert a PDF document to a single HTML file in java. Most of the converters online converts one PDF file to multiple HTML files. I want to convert the whole PDF to a single HTML file. Any suggestions?
Ahsan Abid
  • 609
  • 1
  • 7
  • 9
2
votes
1 answer

PHP shell_exec, permission denied for executing -rwxrwxrwx shell script

I am currently over ssh on a remote CentOS 5.6 system which runs an Apache webserver. I need to use the poppler pdftohtml binary which, unfortunately, is not currently installed on that machine. So I downloaded the poppler package and built it under…
Andrea Sprega
  • 2,221
  • 2
  • 29
  • 35
2
votes
1 answer

How to fix 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp'' error

I am trying to convert texts in pdf file to text or HTML format, but this error is occurring frequently 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp' ' How can I remove this ? I have tried this code in the visual basic studio, but…
2
votes
1 answer

Alternatives to pdftohtml

I'm experimenting with pdftohtml but I'm finding that it's occasionally having difficulty parsing tables correctly. It's grouping the text from two columns into a single cell, which makes my attempts to parse the resulting data futile! Note that…
Sam Crawford
  • 311
  • 4
  • 16
1
2 3 4 5 6