3

I am writing a spark program which is basically a RDD of Strings. What i need to to do is basically create a query per string and do the query based on Elastic search index. So essentially Query would differ on string. I wanted to use elasticsearch-hadoop to do the search so i can have optimizations. The RDD can be large and i m looking for any optimizations possible

For Example RDD is List[India, IBM Company , Netflix , Lebron James]. We will create More like this search on all these terms and do search on the Index Wikipedia and get back the results. For example we will create four more like this query for India and IBM and Netflix and Lebron James and get back the hits for them

I do have work around where i can use HTTP Rest Api call with Bulk search to get back the hits , but there i will be doing optimizations on my own . I wanted to see if we can use the spark elastic connector to create queries and do the search in optimized way

Saurabh Sharma
  • 325
  • 2
  • 4
  • 15
  • Did you try https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-read ? I see it has a an options of query, not sure you can run more-like-this with it – aclowkay Sep 07 '17 at 06:29
  • 1
    I need to run not one query to form rdd , but a set of queries. Like a MultiSearch Query. I now believe you cannot do bulk queries through the connector and would probably roll on my own implementation – Saurabh Sharma Sep 08 '17 at 13:13

1 Answers1

0

This use case is not possible. Elastic search basically assumes one or some more queries, but does not work with n=batch query mode

Saurabh Sharma
  • 325
  • 2
  • 4
  • 15