0

I want to Parallelize PDF to HTML conversion. Not in file level, but in page level or object level. Is it a wise choice for parallelization? If it is so, how it can be done? Will the speed be appreciable in GPU, when compared with the same in CPU??

Vanns
  • 159
  • 7

1 Answers1

1

My simplest answer would be - it may be not feasible.

Basically - The most important classification here is whether a problem is task parallel or data parallel. The first one refers, roughly speaking, to problems where several threads are working on their own tasks, more or less independently. The second one refers to problems where many threads are all doing the same - but on different parts of the data.The latter is the kind of problem that GPUs are good at: They have many cores, and all the cores do the same, but operate on different parts of the input data.

Next issue is to move the data around.

GPU programming is an art, and it can be very, very challenging to get it right.

So the question is - can you parallelize the of the format conversion? I did some conversions before and almost none of them were feasible for parallel processing.

gusto2
  • 11,210
  • 2
  • 17
  • 36
  • 4
    Well, a PDF typically is organized as a set of pages, and you could argue that the HTML conversion does the same thing on different pages. The problem is, each parallel task also has to be simple enough, and "PDF page to HTML conversion" is several orders of magnitude too complex. – MSalters Mar 24 '16 at 12:26
  • Indeed, if there are MANY documents to convert, it would be more reasonable to leverage some high cpu cloud instances to do the job in parallel on default CPUs. Definitely cheaper and faster. – gusto2 Mar 24 '16 at 12:34
  • The main paragraph is copied from http://stackoverflow.com/a/22868938 – Marco13 Jul 23 '16 at 00:36