I am writing a python script that analyses a piece of text and returns the data in JSON format. I am using NLTK, to analyze the data. Basically, this is my flow:
Create an endpoint (API gateway) -> calls my lambda function -> returns JSON of required data.
I wrote my script, deployed to lambda but I ran into this issue:
Resource \u001b[93mpunkt\u001b[0m not found. Please use the NLTK Downloader to obtain the resource:
\u001b[31m>>> import nltk nltk.download('punkt') \u001b[0m
Searched in: - '/home/sbx_user1058/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/var/lang/nltk_data' - '/var/lang/lib/nltk_data'
Even after downloading 'punkt', my script still gave me the same error. I tried the solutions here :
Optimizing python script extracting and processing large data files
but the issue is, the nltk_data folder is huge, while lambda has a size restriction.
How can I fix this issue? Or where else can I use my script and still integrate API call?
I am using serverless to deploy my python scripts.