13

I want to use Google Translate in my project. I completed all the formalities with Google. I have the API key also with me. With this key I can easily translate any word with JavaScript. But how to translate the PDF file as we can do in Google Translate site? I found one thing like this:

http://translate.google.com/translate?hl=fr&sl=auto&tl=en&u=http://www.example.com/PDF.pdf

But here I cannot use my key, as a result it takes so much time to translate. So I want to use my Key and translate a PDF file. Please help me out. My approach is like this:

1. One html page I have.
2. One browse button for pdf
3. Upload the file
4. Transalte the pdf with Google API and show in the html page.

I searched it for this pdf translate with but did not find anything. Please help me out.

niutech
  • 28,923
  • 15
  • 96
  • 106
Saikat
  • 410
  • 2
  • 9
  • 25

2 Answers2

7

TL:DR: Use headless browser to render a PDF from the Google's PDF translation service.

PDF is a complex format and can include many components that are text. To translate it I will describe solution from easy one to more advanced.

Translate raw text

If you only need the translation without the visual output, you can extract the text and give it to Google Translate.

Since you did not provide information on your project (language, environment, ...) I will redirect you to this thread on how to extract text

Translate all text

If you need to get text from everything in your PDF, well that's pretty hard. To avoid headache (partially) you can convert the PDF to an image (using imagemagick tools or similar) and then you have three options:

  • OCR the text from the image, then give it to google, again you are loosing the original form.
  • OCR the text, but saving the position (some libraries can do that, again since you did not specify your project information, see theses links: #1, #2, #3, #4).

    Then translate it with google api, and write the result to the image. For great results you need to take account of text font, color and background color. Pretty difficult, but feasible.

  • Translate the image using google translate image service. Unfortunately this feature is not available in the public API, so unless doing some reverse engineering, this is not possible.

Translate using Google's PDF translation service

The solution you provide by using the translate site can be automated quite easily. The reason it's long is because it is an heavy process and you probably won't beat Google.

Using an headless browser, you can get the translation page with your pdf, then observe that the translated content is sitting in an iframe, get that iframe and finally print to PDF.

Here is a short example using SlimerJS (should be compatible for Phantomjs)

var page = require("webpage").create();

// here you may want to setup page size and options    

// get the page
page.open('https://translate.google.fr/translate?hl=fr&sl=en&u=http://example.com/pdf-sample.pdf', function(status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        // find the iframe with querySelector
        var iframe_src = page.evaluate(function() {
            return document.querySelector('#contentframe').querySelector('iframe').src;
        });

        console.log('Found iframe: ' + iframe_src);

        // render the iframe
        page.open(iframe_src, function(status) {
            // wait a bit for javascript to translate
            // this can be optimized to be triggered in javascript when translation is done
            setTimeout(function() {
                // print the page into PDF
                page.render('/tmp/test.pdf', { format: 'pdf' });

                phantom.exit(0);
            }, 2000);

        });
    }
});

Giving this file: http://www.cbu.edu.zm/downloads/pdf-sample.pdf
It produce this result (translated in French): (I posted a screenshot since I cannot embed PDF ;) ) Pdf result

Community
  • 1
  • 1
Cyrbil
  • 6,341
  • 1
  • 24
  • 40
  • This one seems to be interesting `Translate using Google's PDF translation service` but again my file size in around 1Mb google says limit exceeds :( Also I have word and ppt documents as well.. – Adeel Raza Sep 22 '15 at 14:13
  • Well, that is really a small problem ... You can split your pdf into smaller part with [Imagemagick](http://www.imagemagick.org/script/index.php) `convert x.pdf x-%03d.pdf` – Cyrbil Sep 22 '15 at 14:34
  • Hi, does Google Translation Service keep formatting and images? – NeNaD Apr 17 '21 at 14:21
0

Use Apache Tika to extract the text content of the pdf file(you should write the necessary java code), then use whatever API you want to use to translate it. But, as it has been mentioned above Google Translate is a paid service.

Özgür Eroğlu
  • 1,230
  • 10
  • 16