2

I am having problems with a php script that says that it cannot find file /root/nltk_data/tokenizers/punkt/english.pickle . However I confirmed that the file is here. II downloaded the whole data set multiple times)

The php script actually runs a python script and the nltk (a python module) says that it cannot find /root/nltk_data/tokenizers/punkt/english.pickle

$dir = dirname(__FILE__);
$command =  "/usr/bin/python ". $dir . "/test.py";
exec($command, $output);

On the other hands when i run the python script from command line it works perfectly fine and is able to access the file.

python test.py

Is it possible to enable php to see those files? I chmod 777 the file but this didn't help.

the script contains:

#!/usr/bin/env/ python
import  nltk
try:
    tokens = nltk.word_tokenize("I like apples.")
    tagged = nltk.pos_tag(tokens)
    print "OK!"
    #print ' * '.join(tokens)
except Exception:
    print "error!"
    pass

Error log:

Traceback (most recent call last):
File "/var/zpanel/hostdata/zadmin/public_html/my_domain_com/test.py", line 39, in <module>
tagged = nltk.pos_tag(tokens)
File "/usr/local/lib/python2.7/site-packages/nltk-2.0.4-py2.7.egg/nltk/tag/__init__.py", line   99, in pos_tag
tagger = load(_POS_TAGGER)
File "/usr/local/lib/python2.7/site-packages/nltk-2.0.4-py2.7.egg/nltk/data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
File "/usr/local/lib/python2.7/site-packages/nltk-2.0.4-py2.7.egg/nltk/data.py", line 686, in _open
return find(path).open()
File "/usr/local/lib/python2.7/site-packages/nltk-2.0.4-py2.7.egg/nltk/data.py", line 467, in find
raise LookupError(resource_not_found)
LookupError:
Resource taggers/maxent_treebank_pos_tagger/english.pickle not found. Please use the NLTK Downloader to obtain the resource:
>>> nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

Brana
  • 1,197
  • 3
  • 17
  • 38

2 Answers2

3

There are two reasons for the Resource ... not found error:

  1. Paths to nltk_data not set correctly or nltk_data is not downloaded
  2. nltk_data directory is outdated.

Solution:

Your problem is because your reason 2. So the simplest way is to delete all contents in your nltk_data directory and redownload all the contents using python -c "import nltk; nltk.download('all')".

Also you're using an outdated nltk code so I suggest that you update to NLTK version 3.x since there are major changes from NLTK 2.x to NLTK 3.x

Problem:

The data structure of your nltk_data directory is as such:

nltk_data/tokenizers/punkt/english.pickle

but the latest nltk_data is:

nltk_data/taggers/maxent_treebank_pos_tagger/english.pickle

This suggests that your nltk_data is not updated, although you're using a later version of the NLTK code.


For reason 1, see below

You do not have the nltk_data directory on these paths:

- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

To assure that you have the postagger:

$ python
>>> import nltk
>>> nltk.download('maxent_treebank_pos_tagger')
[nltk_data] Downloading package maxent_treebank_pos_tagger to
[nltk_data]     /home/alvas/nltk_data...
[nltk_data]   Package maxent_treebank_pos_tagger is already up-to-
[nltk_data]       date!
True

Now you will see where nltk saves your data, for me, it's /home/alvas/nltk_data

To know which paths nltk searches for the directory:

$ python
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

You can also manually find where nltk_data directory is saved on your system and add it to the nltk.data.path, e.g. if nltk_data is saved in /home/alvas/work_stuff/:

>>> nltk.data.path.append(`/home/alvas/work_stuff/`)

To ensure that you have central installation, do:

sudo python -m nltk.downloader -d /usr/share/nltk_data all

Also see downloading error using nltk.download()

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • doesn't make any difference – Brana Jan 02 '15 at 14:40
  • what is your nltk version? – alvas Jan 02 '15 at 21:15
  • I use 2.0.4 because it seems that 3.0 doesn't work with python 2.7.3 – Brana Jan 02 '15 at 21:33
  • nltk supports from py2.6-py2.7. If you've found any part of the nltk that doesn't support py2.7.3, could you help to raise an issue in the github and the devs will resolve it asap. – alvas Jan 02 '15 at 22:29
  • have you tried deleting the `nltk_data` directory and redownloading it? – alvas Jan 02 '15 at 22:30
  • Will try something like that. Maybe load mnaually the file. – Brana Jan 02 '15 at 22:35
  • I hope it would. I redo the server installation so I now use python installed different way but the problem is still there. I would use 3.0 version but it doesn't work at all, not even word parser. – Brana Jan 02 '15 at 23:39
  • sudo python -m nltk.downloader -d /usr/share/nltk_data all this solved my problem thanks – Fronto Jun 12 '19 at 11:07
0

I've tried your code on my PHP local machine server and it correctly runs the python with NLTK libraries.

My best guess is:

  1. Check which user is running PHP by doing this. Running:

    echo exec('whoami');

works for me on a linux environment. Typically the user running php is apache if you are using Apache web server.

  1. Check that the user running PHP has read permissions to the /root/nltk_data folder (or whichever folder you placed the NLTK data in).
Community
  • 1
  • 1
maskie
  • 447
  • 4
  • 8