0

I have a django 1.8 view that looks like this:

def sourcedoc_parse(request, sourcedoc_id):
    sourcedoc = Sourcedoc.objects.get(pk=sourcedoc_id)
    nltk.data.path.append('/root/nltk_data')
    new_words = []
    english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())    #<---the line where the error occurs
    results = {}

    template = 'sourcedoc_parse.html'
    params = {'sourcedoc': sourcedoc,'results': results, 'new_words': new_words, 'BASE_URL': BASE_URL}

    return render_to_response(template, params, context_instance=RequestContext(request))

It gives me the following error:

Django Version: 1.8
Python Version: 2.7.6
...
Traceback:
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in get_response
132.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/rosshartshorn/htdocs/worldmaker/sourcedocs/views.py" in sourcedoc_parse
107.     english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py" in __getattr__
68.         self.__load()

File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py" in __load 56. except LookupError: raise e

Exception Type: LookupError at /sourcedoc/parse/13/
Exception Value: 
**********************************************************************
Resource 'corpora/gutenberg' not found.  Please use the NLTK
Downloader to obtain the resource:  >>> nltk.download()
Searched in:
- '/var/www/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/root/nltk_data'
**********************************************************************

What is especially odd is that it works fine when I do it in the same directory in the python shell, it works fine:

Python 2.7.6 (default, Mar 22 2014, 22:59:38) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())
>>> 'jabberwocky' in english_vocab
False
>>> 'monster' in english_vocab
True
>>> nltk.data.path
['/root/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

Does anyone have an idea what is the source of the difference between running it inside a view in django, and doing the same thing at the python command line? I've done the same thing using 'python manage.py shell', and it also works that way.

Any debugging advice on finding the difference is also welcome.

rossdavidh
  • 1,966
  • 2
  • 22
  • 33
  • 1
    Maybe the user running Django don't have permissions to read on /root – Juca Jun 03 '15 at 05:38
  • 1
    Right you are! Moved the corpora/gutenberg data to a place django could access it, and then used "nltk.data.path.append()" to add that directory to nltk's list of what to look in, and it worked! I thought using the django shell would have had the same user/permissions as django, but I was wrong about that. Thanks! You can add your suggestion as an answer if you want so I can select it. – rossdavidh Jun 03 '15 at 22:59
  • Added with comment about the difference between the behavior when running the shell or the server. – Juca Jun 04 '15 at 18:30

1 Answers1

2

The problem here is that the user running django don't have permission to read at /root.

It does not happens when running django shell because you are running the shell as root, but the server is running as the www user (see, the first directory where nltk search is /var/www/nltk_data, the home dir for the www user).

Juca
  • 479
  • 3
  • 5