The most important part here is to make sure that you get the correct underlying query. We‘ve recently had a the case where the wrong setting cost us almost 10x performance. Spring Data uses the High Level Rest Client, so I would generally expect no or a small overhead; if the underlying query is the same. The framework differences are probably small enough where I would prioritize development speed and familiarity.
Our mistake was to return the underlying docs in the aggregation, which is a lot more data to send around / (de)serialize and also won‘t use the cache — that made a difference of 400ms vs 40ms for our aggregation (when we hit the cache).
Edit P.J.Meisch (hope, you don't mind @xeraa), no need for an extra answer:
As already stated, Spring Data Elasticsearch uses the Elasticsearch RestHighLevelClient (and later will use the new Elasticsearch client) and to create an aggregation query you need to use the NativeSearchQuery
where you build the query using Elasticsearch's query builders. So building the query is the same when using the RestHighLevelClient directly.
As already mentioned by @xeraa, if you just need the aggs and not the query data make sure to not return the source docs, in Spring Data Elasticsearch you do that with NativeSearchQueryBuilder.withMaxResults(0)
. You then pass the query as ususal to the ElasticsearchOperations.search()
method.
Spring Data Elasticsearch does not do any parsing on the returned aggregations, you will have to do the same there as you will with diretly using the client.
So I don't see a point where Spring Data Elasticsearch will contribute to a performance problem.