2

I am trying to use nltk from azure data lake analytics using python. I have followed this link for Using Custom Python Libraries with U-SQL. I have zipped the source code of NLTK and have used it using the DeployResource. The source code is dependent on nltk_data folder, which is deployed on the vertex but its location is unknown.

Following is the U-SQL code that i am executing.

REFERENCE ASSEMBLY [ExtPy];

DEPLOY RESOURCE @"/FeedbackAnalysisService/Assemblies/nltk.
DEPLOY RESOURCE @"/FeedbackAnalysisService/Assemblies/nltk_data.zip";

DECLARE @myScript = @"
import sys
sys.path.insert(0, 'nltk.zip')

import nltk

def usqlml_main(df):
 del df['number']
 df['hello_world'] = nltk.word_tokenize('hello world')
 return df
";

@rows = 
 SELECT * FROM (VALUES (1)) AS D(number);

@rows =
 REDUCE @rows ON number
 PRODUCE hello_world string
 USING new Extension.Python.Reducer(pyScript:@myScript);

OUTPUT @rows
 TO "/demo_python_custom_module.csv"
 USING Outputters.Csv(outputHeader: true); 

I get the following error:

Error|Running|  File "nltk.zip\nltk\data.py", line 673, in find
Error|Running|    raise LookupError(resource_not_found)
Error|Running|  Searched in:
Error|Running|    - '/home//nltk_data'
Error|Running|    - 'C:\\nltk_data'
Error|Running|    - 'D:\\nltk_data'
Error|Running|    - 'E:\\nltk_data'
Error|Running|    - 'D:\\Data\\Temp\\f40f07f586ce4469ac593a701790ba00\\3.5.1\\nltk_data'
Error|Running|    - 'D:\\Data\\Temp\\f40f07f586ce4469ac593a701790ba00\\3.5.1\\lib\\nltk_data'
Error|Running|    - 'C:\\Windows\\system32\\config\\systemprofile\\AppData\\Roaming\\nltk_data'
Error|Running|    - ''

Question: I have tried executing it locally and it runs without any error but when i run the script on cloud i get the error because it is unable to find nltk_data. How can i get the path where nltk_data is located on the vertex?

Faizan khan
  • 185
  • 1
  • 12
  • Please take a look at answers on https://stackoverflow.com/questions/22211525/how-do-i-download-nltk-data – alvas Mar 13 '18 at 14:44

2 Answers2

0

I think you are not setting correct path

sys.path.insert(0, '/FeedbackAnalysisService/Assemblies/nltk.zip')

Try above instead and it should work.

prashanth
  • 131
  • 2
  • 3
  • 11
  • I can deploy the nltk successfully on vertex and use it. Nltk is dependent on nltk_data and it searches for nltk_data in a list of paths where we can add our custom paths as well. As of now, I can deploy nltk and nltk_data but the problem is regarding the path where the nltk_data is deployed. – Faizan khan Apr 02 '18 at 07:26
0

Below is the link to the documentation which explains how to do import custom package https://blogs.msdn.microsoft.com/azuredatalake/2017/03/10/using-custom-python-libraries-with-u-sql/