I use scrapy crawl data and save it to mongodb, i want to save 2dsphere
index in mongodb.
Here is my pipelines.py file with scrapy
from pymongo import MongoClient
from scrapy.conf import settings
class MongoDBPipeline(object):
global theaters
theaters = []
def __init__(self):
connection = MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT'])
self.db = connection[settings['MONGODB_DB']]
self.collection = self.db[settings['MONGODB_COLLECTION']]
def open_spider(self, spider):
print 'Pipelines => open_spider =>'
def process_item(self, item, spider):
global theaters
# get the class item name to be collection name
self.collection = self.db[type(item).__name__.replace('_Item','')]
if item['theater'] not in theaters:
print 'remove=>',item['theater']
theaters.append(item['theater'])
self.collection.remove({'theater': item['theater']})
# insert the collection name that is from class object item
self.collection.insert(dict(item))
# Here is what i try to create 2dsphere index
self.collection.create_index({"location": "2dsphere"})
return item
When i use self.collection.create_index({"location": "2dsphere"})
It shows error TypeError: if no direction is specified, key_or_list must be an instance of list
If i try
self.collection.create_index([('location', "2dsphere")], name='search_index', default_language='english')
There is no error any more , but my mongodb still hasn't any index under location
.
I think i obey the GeoJson format.
Is any way to save 2dsphere
index in mongodb when i using scrapy
? Or should i just save the data like the photo structure and save index by another server file (like nodejs
)
Any help would be appreciated. Thanks in advance.
According to Adam Harrison
respond, i try to change my mongodb name location
to geometry
Than add code import pymongo
in my pipelines.py file
and use self.collection.create_index([("geometry", pymongo.GEOSPHERE)])
There is no any error but still can't find the index under geometry