2

We need the Romanization feature badly. Can someone please help? We want to transliterate (not translate) from Hindi (Devanagiri script) language to English (Roman script) language.

Input
romanize_text('अंतिम लक्ष्य क्या है')

Expected Output
'antim lakshya kya hai'

As per the Google Romanize text docs, I wrote the following Python code to transliterate from some language script to Roman script.

# Authenticate using credentials.
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "translate.json"

PROJECT_ID = "project-id"
LOCATION = "global"

# Imports the Google Cloud Translation library
from google.cloud import translate_v3

# Transliteration.
def romanize_text(text, src_lang="hi", tgt_lang="en"):

    client = translate_v3.TranslationServiceClient()
    parent = f"projects/{PROJECT_ID}/locations/{LOCATION}"

    response = client.romanize_text(
        request={
            "parent": parent,
            "contents": [text],
            "source_language_code": src_lang,
            "target_language_code": tgt_lang,
        }
    )

    # Display the romanized for each input text provided
    for romanization in response.romanizations:
        print(f"Romanized text: {romanization.romanized_text}")

romanize_text('अंतिम लक्ष्य क्या है')

Running the above code, gives the following error:

AttributeError: 'TranslationServiceClient' object has no attribute 'romanize_text'

Also, in the Google's API reference of romanizeText, the right-hand side API Explorer is broken. Whereas, if you select any other method from the left-hand side - its API Explorer works correctly.

We need the Romanization feature badly: so either a solution to the aforementioned problem, or an alternative non-Google solution for romanization would be fine.

molbdnilo
  • 64,751
  • 3
  • 43
  • 82
RajdeepPal
  • 21
  • 4

1 Answers1

1

You are receiving the error when you call client.romanize_text in your function because there is no romanize_text function in the source code for the client.

The transliteration documentation for "advanced translating text v3" says that:

Transliteration is a configuration setting in the translateText method. When you enable transliteration, you translate romanized text (Latin script) directly to a target language.

However, you want to translate from a specified language to romanized text so this feature doesn't seem to be available (yet) via the Google Cloud Translate API. This observation is substantiated/alluded to in this Stack Overflow answer to a question similar to yours.

It seems like the PyPI package ai4bharat-transliteration by the researchers at AI4Bharat is a viable non-Google alternative for transliteration from Hindi to romanized text.

Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24
  • 1
    I tried non-Google AIs like `polyglot` and `indic-transliteration` which didn't work well. `AI4Bharat` is better than those two. **Thanks Kyle**. For Google's `romanize_text` API not working, I've reported them at 2 places: 1. [Google API groups](https://groups.google.com/g/google-apis-discovery/c/F7DV4kWikNE) 2. [Github Issue](https://github.com/googleapis/python-translate/issues/490) Do you agree that no other library can beat Google's AI transliteration capability? I think so, that is why I've reported them at both the places in hope that they'll consider my issue. – RajdeepPal May 24 '23 at 14:51
  • @RajdeepPal The modern transformer encoder-decoder models which are used for these types of conversion tasks require expertise to design, and substantial amounts of data and compute power to train. Google has both; hence, their library works so well in a variety of tasks. Usually only dedicated libraries which focus on a niche (like AI4Bharat's focus on only Indic languages) will match or exceed Google's generalised yet powerful capability. Feedback always helps guide development focus, so hopefully they'll take yours on-board and add the feature. – Kyle F Hartzenberg May 24 '23 at 23:10