1

I have a csv file of several thousands of rows in multiple languages and I am thinking of using google cloud translate API to translate foreign language text into English. I have used a simple code to find out if everything works properly and the code is running smoothly.

from google.cloud import translate_v2 as translate
from time import sleep
from tqdm.notebook import tqdm
import multiprocessing as mp
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "file path.py"
translate_client = translate.Client()
text = "Good Morning, My Name is X."
target ="ja"
output = translate_client.translate(text, target_language=target)
print(output)

I want to now import csv file (using pandas) and translate the text and save the output as a csv file. But don't know how should I do that. Most of the examples I found stop at translating sample text just like above.

Can anyone suggest how can I do this?

Piyush Ghasiya
  • 515
  • 7
  • 25
  • Does this answer your question [stackoverflow.com/questions/67198105](https://stackoverflow.com/questions/67198105/translate-dataframe-python-to-english-and-save-the-result-into-a-cvs-file) ? – Prajna Rai T Apr 05 '22 at 10:36
  • @PrajnaRaiT the question which you suggested is using googletrans library but I want to do same kind of thing using google.cloud translate (API). Though I don't have any issues with using googletrans library as long as it is working properly but the problem is that I am getting timeout error even in relatively small dataset (~21000 rows). Can you please suggest any other solution? – Piyush Ghasiya Apr 06 '22 at 03:18
  • 1
    You can use this [code](https://gist.github.com/yeiichi/23f97a6004c1ff3cb106445a437d73fd) to translate a text file and save as a CSV file using Google Cloud Translation API. Let me know if it’s helpful or not? – Prajna Rai T Apr 11 '22 at 08:22
  • @PrajnaRaiT Thank you very much. I will try this code and let you know. – Piyush Ghasiya Apr 14 '22 at 00:55
  • Did the code help in translating text file as per your requirement? – Prajna Rai T Apr 17 '22 at 14:05
  • Hi @PiyushGhasiya, If my answer addressed your question, please consider accepting and upvoting it. If not, let me know so that I can improve the answer. – Prajna Rai T Apr 18 '22 at 13:27

1 Answers1

1

To translate the text in csv file and save the output in same CSV file using Google Cloud Translation API, you can use below code:

import csv
from pathlib import Path


def translate_text(target, text):
    """Translates text into the target language.
    Target must be an ISO 639-1 language code.
    See https://g.co/cloud/translate/v2/translate-reference#supported_languages
    """
    import six
    from google.cloud import translate_v2 as translate

    translate_client = translate.Client()

    if isinstance(text, six.binary_type):
        text = text.decode("utf-8")

    # Text can also be a sequence of strings, in which case this method
    # will return a sequence of results for each text.
    result = translate_client.translate(text, target_language=target)

    # print(u"Text: {}".format(result["input"]))
    # print(u"Translation: {}".format(result["translatedText"]))
    # print(u"Detected source language: {}".format(result["detectedSourceLanguage"]))
    return result["translatedText"]


def main(input_file, translate_to):
    """
    Translate a text file and save as a CSV file
    using Google Cloud Translation API
    """
    input_file_path = Path(input_file)
    target_lang = translate_to
    output_file_path = input_file_path.with_suffix('.csv')

    with open(input_file_path) as f:
        list_lines = f.readlines()
        total_lines = len(list_lines)
    with open(output_file_path, 'w') as csvfile:
        my_writer = csv.writer(csvfile, delimiter=',', quotechar='"')
        my_writer.writerow(['id', 'original_text', 'translated_text'])

        for i, each_line in enumerate(list_lines):
            line_id = f'{i + 1:04}'
            original_text = each_line.strip('\n')  # Strip for the writer(*).
            translated_text = translate_text(
                target=target_lang,
                text=each_line)
            my_writer.writerow([line_id, original_text, translated_text])  # (*)
            # Progress monitor, non-essential.
            print(f"""
{line_id}/{total_lines:04}
  {original_text}
  {translated_text}""")


if __name__ == '__main__':
    origin_file = input('Input text file? >> ')
    output_lang = input('Output language? >> ')
    main(input_file=origin_file,
         translate_to=output_lang)

Example:

Translated text in input file to target language “es”, the output got stored in the same csv file.

Input:

new.csv

How are you doing,Is everything fine there
Do it today

Output:

new.csv

id,original_text,translated_text
0001,"How are you doing,Is everything fine there",¿Cómo estás? ¿Está todo bien allí?
0002,Do it today,Hazlo hoy
Prajna Rai T
  • 1,666
  • 3
  • 15
  • after installing Google CLI and Google API credentials stored in the local credential file ADC used, your script runs very well, but it is slooooow! actually, I think accessing Google API from the local environment is slow, I calculated that translating 400k words would take 28 days! Now, I'm trying to use Google Cloud Shell and the time is pretty less, 18 hours, but still too long. Do you know how to accelerate the process 100 times or at least 10 times? I'd appreciate your help, thanks in advance! – Tomasa Caballero Apr 24 '23 at 17:36