Scaling Vespa for 500 QPS for search

Question

We have created a custom searcher. Our document size is around 400 000. Latency remains in less than 100ms but when we are doing load test, it does not give QPS of more than 80, and latency also increases up to 4-5 seconds. We are using 9 node cluster (c5.2xlarge - 8vcpu and 16GB RAM) in group distribution (3 groups of size 2 with replication 3 and searchable copies 3). We tried different distributions but could not gain speed. We tried with different values of tuning parameters with even large compute instances

<requestthreads>
   <search>64/128</search>
   <persearch>1</persearch>
   <summary>16</summary>
</requestthreads>

What should be the better approach to find the bottleneck? With such a big cluster, we should be able to achieve 500 QPS for 500k records.

score 4 · Answer 1 · answered Feb 09 '20 at 06:01

Read this if you haven't yet: https://docs.vespa.ai/documentation/performance/sizing-search.html

You need to measure to determine how much resources your queries are consuming. That should give you an idea of how much throughput you can handle and where it makes sense to tune. Then you can run load tests at various loads to verify the behavior and find the max throughput. At 80 qps this system is at overload, which is not good useful for observing the system because you don't want to run at overload in production.

If you want you can deploy your application on https://cloud.vespa.ai instead of running it yourself, then we could give you better insights by looking at it from our side.

score 2 · Answer 2 · answered Feb 10 '20 at 07:50

If you have tried different content document distributions as described in the sizing documentation (1*) without passing 80 QPS I'm suspecting that you are network bound between your benchmarking client and the Vespa search container(s). For example, if you have 400KB documents and return all of it as part of default summary you will saturate a 1Gbps network interface at 31 QPS with default 10 hits as

1Gbps interface = 125 000 KB/s
10 hits a 400KB = 4000KB per result
125 000 KB/s/4000KB/query = max 31 queries/s on a 1Gbps.

Passing the "Accept-Encoding: gzip" header will enable compression between your client and the container(s). The internal communication inside Vespa is already compressed (if large enough). You can also enable an explicit document summary which contains less data (2*).

(1*) https://docs.vespa.ai/documentation/performance/sizing-search.html

(2*) https://docs.vespa.ai/documentation/document-summaries.html

Scaling Vespa for 500 QPS for search

2 Answers2