I am trying to use nltk from azure data lake analytics using python. I have followed this link for Using Custom Python Libraries with U-SQL. I have zipped the source code of NLTK and have used it using the DeployResource. The source code is dependent on nltk_data folder, which is deployed on the vertex but its location is unknown.
Following is the U-SQL code that i am executing.
REFERENCE ASSEMBLY [ExtPy];
DEPLOY RESOURCE @"/FeedbackAnalysisService/Assemblies/nltk.
DEPLOY RESOURCE @"/FeedbackAnalysisService/Assemblies/nltk_data.zip";
DECLARE @myScript = @"
import sys
sys.path.insert(0, 'nltk.zip')
import nltk
def usqlml_main(df):
del df['number']
df['hello_world'] = nltk.word_tokenize('hello world')
return df
";
@rows =
SELECT * FROM (VALUES (1)) AS D(number);
@rows =
REDUCE @rows ON number
PRODUCE hello_world string
USING new Extension.Python.Reducer(pyScript:@myScript);
OUTPUT @rows
TO "/demo_python_custom_module.csv"
USING Outputters.Csv(outputHeader: true);
I get the following error:
Error|Running| File "nltk.zip\nltk\data.py", line 673, in find
Error|Running| raise LookupError(resource_not_found)
Error|Running| Searched in:
Error|Running| - '/home//nltk_data'
Error|Running| - 'C:\\nltk_data'
Error|Running| - 'D:\\nltk_data'
Error|Running| - 'E:\\nltk_data'
Error|Running| - 'D:\\Data\\Temp\\f40f07f586ce4469ac593a701790ba00\\3.5.1\\nltk_data'
Error|Running| - 'D:\\Data\\Temp\\f40f07f586ce4469ac593a701790ba00\\3.5.1\\lib\\nltk_data'
Error|Running| - 'C:\\Windows\\system32\\config\\systemprofile\\AppData\\Roaming\\nltk_data'
Error|Running| - ''
Question: I have tried executing it locally and it runs without any error but when i run the script on cloud i get the error because it is unable to find nltk_data. How can i get the path where nltk_data is located on the vertex?