I have implemented bulk indexing. I'd like to make it more efficient.
# current implementation in Python
def products_to_index():
for product in all_products():
yield {
"_op_type": "index",
"_index": INDEX_NAME,
"_id": product.id,
"_source": {"name": product.name, "content": product.content},
}
def main(args):
# Connect to localhost:9200 by default.
es = Elasticsearch()
body = ANALYZER
es.indices.create(index=INDEX_NAME, body=body)
bulk(es, products_to_index())
This implementation seems to just take all the data and index them batch by batch. I'd like to implement an additional step to check whether this entry has already been indexed.
I also thought about loading from the path of saved indices locally. Not sure how to proceed.
I looked at the API documentation, but I can't find any.