-1

I'm trying to develop a program which allows the user to convert a pdf file to a word file using vb.net.

Is there any good API for this ?

And, is it as easy as it looks like?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Gentuzos
  • 265
  • 2
  • 6
  • 14
  • Unless you understand both PDF file-format & Word file-format, why would it be easy? You will need libraries to read & write these formats. Have you looked into how to do that? You can get out plain text, with many limitations, but you won't get much formatting across. – Thomas W Aug 02 '13 at 05:32
  • Yes, I knew that it needs one library to read pdf files and another to write ms word files. But, I can't find how do I to detect images during the reading of a pdf file and extract it. It should exist a library for that though. – Gentuzos Aug 02 '13 at 05:35
  • Perhaps iTextSharp -- but I've used the original Java iText, and it's not easy. http://stackoverflow.com/questions/83152/reading-pdf-documents-in-net – Thomas W Aug 02 '13 at 05:41
  • Did that work in Java to extract images ? – Gentuzos Aug 02 '13 at 05:41
  • Did you convert a whole pdf file to ms word in Java please? I just need this to know if it is possible. – Gentuzos Aug 02 '13 at 05:57
  • Of course is possible, I have a teammate that did a program to do that. Readed the pdf with iTextSharp and create a word with the data. – SysDragon Aug 02 '13 at 06:07
  • Best look into the library yourself. There's no magic wand. Of course "something" is possible. You can get plain text, most of the time, if the PDF document has it. Supposedly you can get images. Getting formatting is probably far more than trivial. But experimenting & investigating how well you can meet _your_ requirements is _your_ job. – Thomas W Aug 02 '13 at 12:06
  • Thank you Thomas for your advices. But, my problem is that I don't know how to meet my needs. What I usually do is to apply a forum if I can not find a solution in the first sites returned by Google. Could you explain your procedure in the search for a solution to a problem please? – Gentuzos Aug 02 '13 at 14:21
  • see discussion here -http://stackoverflow.com/questions/5729874/how-to-convert-pdf-to-word-in-c-sharp – Spire.Presentation API Mar 02 '15 at 07:42

1 Answers1

0

try this,

' Path of input PDF document
Dim filePath As String = "d:\\Source.pdf"
' Instantiate the Document object
Dim document As Aspose.Pdf.Document = New Aspose.Pdf.Document(filePath)
' Create DocSaveOptions object
Dim saveOptions As DocSaveOptions = New DocSaveOptions()
' Set the recognition mode as Flow
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow
' Set the Horizontal proximity as 2.5
saveOptions.RelativeHorizontalProximity = 2.5F
' Enable the value to recognize bullets during conversion process
saveOptions.RecognizeBullets = True
' save the resultnat DOC file
document.Save("d:\\Resultant.doc", saveOptions)
Manish Sharma
  • 2,406
  • 2
  • 16
  • 31