20

I'm trying to use NLTK, the stanford natural language toolkit. After install the required files, I start to execute the demo code: http://www.nltk.org/index.html

>>> import nltk

>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""

>>> tokens = nltk.word_tokenize(sentence)

>>> tokens

['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',

'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']

>>> tagged = nltk.pos_tag(tokens)

>>> tagged[0:6]

[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),

('Thursday', 'NNP'), ('morning', 'NN')]

>>> entities = nltk.chunk.ne_chunk(tagged)

>>> entities

Then I get message:

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.

I tried google, but there's no one tell what the missing gs file is.

Danibix
  • 235
  • 9
  • 18
Jie Hu
  • 539
  • 1
  • 5
  • 16

8 Answers8

13

I came across this error too.

gs stands for ghostscript. You get the error because your chunker is trying to use ghostscript to draw a parse tree of the sentence, something like this:

enter image description here

I was using IPython; to debug the issue I set the traceback verbosity to verbose with the command %xmode verbose, which prints the local variables of each stack frame. (see the full traceback below) The file names are:

file_names=['gs', 'gswin32c.exe', 'gswin64c.exe']

A little Google search for gswin32c.exe told me it was ghostscript.

/Users/jasonwirth/anaconda/lib/python3.4/site-packages/nltk/__init__.py in find_file_iter(filename='gs', env_vars=['PATH'], searchpath=(), file_names=['gs', 'gswin32c.exe', 'gswin64c.exe'], url=None, verbose=False)
    517                         (filename, url))
    518         div = '='*75
--> 519         raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
    520 
    521 def find_file(filename, env_vars=(), searchpath=(),

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.
===========================================================================
Jason Wirth
  • 745
  • 1
  • 10
  • 17
  • 9
    For mac users, you can install ghostscript via brew ```brew install ghostscript```. For other OS, instructions can be found here: https://wiki.scribus.net/canvas/Installation_and_Configuration_of_Ghostscript – naoko Jun 21 '16 at 14:58
  • I installed ghostscript and I'm still getting the same error, even though a windows search shows a "gswin64c.exe" file. – Alex Kinman Aug 17 '16 at 22:51
  • Windows: After installing Ghostscript and manually adding the Ghostscript bin folder to my path, I still needed to restart my machine for NLTK to pick up the Ghostcript executable. – Eric McLachlan May 08 '19 at 11:24
6

Just to add to the previous answers, if you replace 'entities' with 'print(entities)' you won't get the error.

Without print() the console/notebook doesn't know how to "draw" a tree object.

Axle Max
  • 785
  • 1
  • 14
  • 23
5

A bit addition to Jason Wirth's answer. Under Windows, this line of code will search for "gswin64c.exe" in the environment variable PATH, however, the ghostscript installer does not add the binary to PATH, so for this to work, you'll need to find where ghostscript is installed and add the /bin subfolder to PATH.

For example, in my case I added C:\Program Files\gs\gs9.19\bin to PATH.

Shuyang Sheng
  • 51
  • 1
  • 2
1

If ghostscript for some reason is not available for your platform or fails to install you can also use the wonderful networkx package to visualize such trees:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt

def drawNodes(G,nodeLabels,parent,lvl=0):
    def addNode(G,nodeLabels,label):
        n = G.number_of_nodes()
        G.add_node(n)
        nodeLabels[n] = label
        return n
    def findNode(nodeLabels,label):
        # Travel backwards from end to find right parent
        for i in reversed(range(len(nodeLabels))):
            if nodeLabels[i] == label:
                return i

    indent = " "*lvl
    if lvl == 0:
        addNode(G,nodeLabels,parent.label())
    for node in parent:
        if type(node) == nltk.Tree:
            n = addNode(G,nodeLabels,node.label())
            G.add_edge(findNode(nodeLabels,parent.label()),n)
            drawNodes(G,nodeLabels,node,lvl+1)
        else:
            print node
            n1 = addNode(G,nodeLabels,node[1])
            n0 = addNode(G,nodeLabels,node[0])
            G.add_edge(findNode(nodeLabels,parent.label()),n1)
            G.add_edge(n0,n1)

G = nx.Graph()
nodeLabels = {}
drawNodes(G,nodeLabels,entities)
options = {
    'node_color': 'white',
    'node_size': 100
 }
plt.figure(1,figsize=(12,6))
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos, font_weight='bold', arrows=False, **options)
l = nx.draw_networkx_labels(G,pos,nodeLabels) 

NLTK Token Tree plotted with NetworkX

amagard
  • 339
  • 3
  • 8
1

Instead of entities write entities.draw() It should work.

0

In addition to Alex Kinman, I also still get the same error, even after installing ghostscript and adding it to the nltk path. Using print() enables the entities to be printed, and even with this error I seem to be able to get the output below, but unfortunately no tree yet.

Tree('S', [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'NN'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'), Tree('PERSON', [('Arthur', 'NNP')]), ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'), ('very', 'RB'), ('good', 'JJ'), ('.', '.')]) 
Dennis
  • 1
0

In my case, I had to restart my system after I run exutable file gs9.53.3.exe and set the C:\Program Files\gs\gs9.53.3\bin to my PATH

0

For me it worked with 'conda install --channel=conda-forge ghostscript' on conda prompt. My OS is Windows and I am running the code on jupyter notebook.

AGR
  • 1