I need to extract all the city names from a website. I've used beautifulSoup with RE in previous projects but on this website the city names are part of regular text and do not have a specific format. I found geograpy package (https://pypi.python.org/pypi/geograpy/0.3.7) that fulfills my requirements.
Geograpy uses nltk package. I installed all the models and packages for nltk but it keeps throwing this error:
>>> import geograpy
>>> places = geograpy.get_place_context(url="http://www.state.gov/misc/list/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\geograpy\__init__.py", line 6, in get_place_context
e.find_entities()
File "C:\Python27\lib\site-packages\geograpy\extraction.py", line 31, in find_entities
if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
File "C:\Python27\lib\site-packages\nltk\tree.py", line 198, in _get_node
raise NotImplementedError("Use label() to access a nod label.")
NotImplementedError: Use label() to access a node label.
Any help would be appreciated