2

Hi I am learning Natural Language processing using NLTK. I am trying to implement babelize_shell() example of the book. What I am doing is executing babelize_shell(), after that I am entering my string, followed by german as stated in the book, followed by run.

The error I am getting is:

Traceback (most recent call last):
  File "<pyshell#148>", line 1, in <module>
    babelize_shell()
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 175, in babelize_shell
    for count, new_phrase in enumerate(babelize(phrase, 'english', language)):
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 126, in babelize
    phrase = translate(phrase, next, flip[next])
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 106, in translate
    if not match: raise BabelfishChangedError("Can't recognize translated string.")
BabelfishChangedError: Can't recognize translated string.

Here's an example session:

>>> babelize_shell()
NLTK Babelizer: type 'help' for a list of commands.
Babel> how long before the next flight to Alice Springs?
Babel> german
Babel> run
0> how long before the next flight to Alice Springs?
1> wie lang vor dem folgenden Flug zu Alice Springs?
2> how long before the following flight to Alice jump?
3> wie lang vor dem folgenden Flug zu Alice springen Sie?
4> how long before the following flight to Alice do you jump?
5> wie lang, bevor der folgende Flug zu Alice tun, Sie springen?
6> how long, before the following flight to Alice does, do you jump?
7> wie lang bevor der folgende Flug zu Alice tut, tun Sie springen?
8> how long before the following flight to Alice does, do you jump?
9> wie lang, bevor der folgende Flug zu Alice tut, tun Sie springen?
10> how long, before the following flight does to Alice, do do you jump?
11> wie lang bevor der folgende Flug zu Alice tut, Sie tun Sprung?
12> how long before the following flight does leap to Alice, does you?
Quentin Pradet
  • 4,691
  • 2
  • 29
  • 41
Max
  • 9,100
  • 25
  • 72
  • 109

1 Answers1

7

I'm having the same problem right now.

I've found this: http://nltk.googlecode.com/svn/trunk/doc/api/nltk.misc.babelfish-module.html

and it says: BabelfishChangedError Thrown when babelfish.yahoo.com changes some detail of their HTML layout, and babelizer no longer submits data in the correct form, or can no longer parse the results.

I'm going to see if there's a way to fix this.

The solution I came out right now uses the Microsoft Translator web service (SOAP). It's not an easy solution, but funny to code.

I followed the instructions in http://msdn.microsoft.com/en-us/library/hh454950 and then modified the babelfish.py which is found in nltk/misc/babelfish.py

  1. Subscribe to the Microsoft Translator API on Azure Marketplace

Subscribe to the Microsoft Translator API on Azure Marketplace , I've choosen the free subscription.

  1. Register your application Azure DataMarket

To register your application with Azure DataMarket, visit datamarket.azure.com/developer/applications/ using the LiveID credentials from step 1, and click on “Register”. Write down your client id and your client secret for later use

  1. Install suds for Python fedorahosted.org/suds/

  2. Modify the babelfish.py (use your own cliend_id and secret):

//imports to add

from suds.client import Client
import httplib
import ast

...

#added function
def soaped_babelfish(TextToTranslate,codeLangFrom, codeLangTo):

    #Oauth credentials
    params = urllib.urlencode({'client_id': 'babelfish_soaped', 'client_secret': '1IkIG3j0ujiSMkTueCZ46iAY4fB1Nzr+rHBciHDCdxw=', 'scope': 'http://api.microsofttranslator.com', 'grant_type': 'client_credentials'})


    headers = {"Content-type": "application/x-www-form-urlencoded"}
    conn = httplib.HTTPSConnection("datamarket.accesscontrol.windows.net")
    conn.request("POST", "/v2/OAuth2-13/", params, headers)
    response = conn.getresponse()
    #print response.status, response.reason

    data = response.read()


    #obtain access_token
    respondeDict = ast.literal_eval(data)
    access_token = respondeDict['access_token']
    conn.close()


    #use the webservice with the accesstoken
    client = Client('http://api.microsofttranslator.com/V2/Soap.svc')

    result = client.service.Translate('Bearer'+' '+access_token,TextToTranslate,codeLangFrom, codeLangTo, 'text/plain','general')

    return result

...

#modified translate method
def translate(phrase, source, target):
    phrase = clean(phrase)
    try:
        source_code = __languages[source]
        target_code = __languages[target]
    except KeyError, lang:
        raise ValueError, "Language %s not available " % lang

    return clean(soaped_babelfish(phrase,source_code,target_code))

And that's all for the SOAPed version! Some other day I'll try a web only based solution (similar to the current babelfish.py but adapted to the changes)

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308