TL:DR: Use headless browser to render a PDF from the Google's PDF translation service.
PDF is a complex format and can include many components that are text. To translate it I will describe solution from easy one to more advanced.
Translate raw text
If you only need the translation without the visual output, you can extract the text and give it to Google Translate.
Since you did not provide information on your project (language, environment, ...) I will redirect you to this thread on how to extract text
Translate all text
If you need to get text from everything in your PDF, well that's pretty hard. To avoid headache (partially) you can convert the PDF to an image (using imagemagick tools or similar) and then you have three options:
- OCR the text from the image, then give it to google, again you are loosing the original form.
OCR the text, but saving the position (some libraries can do that, again since you did not specify your project information, see theses links: #1, #2, #3, #4).
Then translate it with google api, and write the result to the image. For great results you need to take account of text font, color and background color. Pretty difficult, but feasible.
Translate the image using google translate image service. Unfortunately this feature is not available in the public API, so unless doing some reverse engineering, this is not possible.
Translate using Google's PDF translation service
The solution you provide by using the translate site can be automated quite easily. The reason it's long is because it is an heavy process and you probably won't beat Google.
Using an headless browser, you can get the translation page with your pdf, then observe that the translated content is sitting in an iframe, get that iframe and finally print to PDF.
Here is a short example using SlimerJS (should be compatible for Phantomjs)
var page = require("webpage").create();
// here you may want to setup page size and options
// get the page
page.open('https://translate.google.fr/translate?hl=fr&sl=en&u=http://example.com/pdf-sample.pdf', function(status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
// find the iframe with querySelector
var iframe_src = page.evaluate(function() {
return document.querySelector('#contentframe').querySelector('iframe').src;
});
console.log('Found iframe: ' + iframe_src);
// render the iframe
page.open(iframe_src, function(status) {
// wait a bit for javascript to translate
// this can be optimized to be triggered in javascript when translation is done
setTimeout(function() {
// print the page into PDF
page.render('/tmp/test.pdf', { format: 'pdf' });
phantom.exit(0);
}, 2000);
});
}
});
Giving this file: http://www.cbu.edu.zm/downloads/pdf-sample.pdf
It produce this result (translated in French): (I posted a screenshot since I cannot embed PDF ;) )
