Questions tagged [document-conversion]

Document conversion is the act of converting one document's format to another, which allows the document to be read in many more applications. Documents can be converted into other source document formats, consumer formats or structured data.

70 questions
39
votes
6 answers

Convert pdf, doc, ppt to html5

I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does? If anyone knows of a paid service, that would…
KevMo
  • 5,590
  • 13
  • 57
  • 70
22
votes
4 answers

An efficient way to convert document to pdf format

I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter, but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a…
Aamir Rind
  • 38,793
  • 23
  • 126
  • 164
8
votes
2 answers

Tools to convert multipage PDF to multipage TIFF

I'm writing a small application to convert several multipage PDF's to multipage TIFF files. Per the other questions and answers on this site, I've tried both ghostscript and ImageMagick however both pieces of software only covert the first page when…
William Seemann
  • 3,440
  • 10
  • 44
  • 78
7
votes
1 answer

R markdown pandoc document conversion failed with error 1 after updating pandoc from 1.19 to 2.4

I recently installed pandoc 2.4 on Windows and the conversion failed with error 1 occurs for all knitting. I can't knit html, word, and pdf. The error says output file: template.knitmd pandoc.exe: template.utf8.md: openBinaryFile: does not exist…
Moses Kim
  • 73
  • 1
  • 1
  • 4
6
votes
3 answers

How can I take preview of documents?

I'm working on a file sharing website, I need a way to take screenshots of the uploaded documents. The site will support several file formarts, from plain text to office documents (doc, xls, ppt, ...), videos (mpeg, avi, ...), images (jpg, gif, png,…
Flupkear
  • 2,135
  • 7
  • 29
  • 32
5
votes
3 answers

Alternative to Tika/PDFBox for parsing PDF in Solr (any version later than 1.4)

Seems like Solr is not parsing my PDF files correctly. I was wondering if there is any other alternative to using Apache Tika (which I believe uses PDFBox internally) for parsing PDF files? I seem to be getting random spaces in between my content…
4
votes
1 answer

What technology is used behind A.nnotate.com?

I would like to know how do services like A.nnotate.com, Scribd, Google Docs render pdf,.doc, or any other document into HTML and how does the annotation system work?
thinkquester
  • 308
  • 3
  • 18
4
votes
4 answers

API for document format conversion

I am looking for a RESTful web service to which I can send a document (doc, docx, xls, xlsx, ppt, pptx, and tiff at a minimum) for conversion to pdf and swf. The reason I need swf in addition to pfd is so that I can display the document in the…
3
votes
1 answer

How to convert multiple documents using the Document Conversion service ina script bash?

How can I convert more than one document using the Document Conversion service. I have between 50-100 MS Word and PDF documents that I want to convert using the convert_document API method? For example, can you supply multiple .pdf or *.doc files…
German Attanasio
  • 22,217
  • 7
  • 47
  • 63
3
votes
3 answers

converting dates things from visual basic to c-sharp

So as an excercise in utility i've taken it upon myself to convert one of our poor old vb .net 1.1 apps to C# .net 4.0. I used telerik code conversion for a starting point and ended up with ~150 errors (not too bad considering its over 20k of code…
sinrtb
  • 136
  • 1
  • 2
  • 6
3
votes
0 answers

Open source html5 document viewer for mobile apps

I am building mobile app for Android and iOS platforms using Phonegap. I want to use a html5 document viewer to display pdf's, ppt's and all in the mobile app. I saw Crocodoc. Its good, but i need to something open source which i can tinker with.…
ghostCoder
  • 1
  • 9
  • 49
  • 72
2
votes
3 answers

Convert PDF file to a single HTML file

I am trying to convert a PDF document to a single HTML file in java. Most of the converters online converts one PDF file to multiple HTML files. I want to convert the whole PDF to a single HTML file. Any suggestions?
Ahsan Abid
  • 609
  • 1
  • 7
  • 9
2
votes
2 answers

Converting PDF, Doc and Docx to rtf in c#

I have a requirement for an application that takes Doc, Docx and PDF and converts them to RTF. The conversion is one way and I do not need to convert back to Doc or PDF. Has anyone done this and can you recommend a libray? I know there is aspose…
griegs
  • 22,624
  • 33
  • 128
  • 205
2
votes
1 answer

is not JSON serializable

following the document conversion API example trying to use Flask to convert msword document to text, but it does not work. Here is the code import os, json, requests from flask import Flask, jsonify from watson_developer_cloud import…
user6332732
  • 23
  • 1
  • 1
  • 5
2
votes
0 answers

Convert Word document with MergeFields to PDF with form fields

I have a document template in Word .doc format. The Word document contains Merge fields that needs to be populated dynamically. I need to convert the Word document to a PDF with form fields. This PDF can then be populated from our Java application…
Wilhelm Kleu
  • 10,821
  • 4
  • 36
  • 48
1
2 3 4 5