How to configure shards in Vespa?

Question

We want to setup cluster of 4 nodes to host data. And the cluster hosts one index only, so to have similar data type in all 4 node.

Our goal is to have data sharded on the nodes. Let say two shards and two replica. (total 4 nodes to host these 4 data partition)

Document mode is "index" and global is "true".

   <redundancy>2</redundancy>

   <nodes>
      <node hostalias="node1" distribution-key="0"/>
      <node hostalias="node2" distribution-key="1"/>
      <node hostalias="node3" distribution-key="2"/>
      <node hostalias="node4" distribution-key="3"/>
    </nodes>        

    <engine>
      <proton>
        <searchable-copies>2</searchable-copies>
        <flush-on-shutdown>true</flush-on-shutdown>
      </proton>
    </engine>

Above config of in services.xml is not allowed. it asks redundancy to be at least same number as nodes and we need to configure,

<redundancy>4</redundancy>

and

<searchable-copies>4</searchable-copies>

for it to accept a valid config.

And that is configuring all 4 nodes to have all the data and each would contain copy of data. According to http://docs.vespa.ai/documentation/content/data-placement.html - we need global=true. And noticed:

Note: The global documents feature is under development. It is currently only available for setups where all documents are already inherently on all nodes, i.e. N groups each containing a single node.

How to distribute data in shards? Can we make node1 and node2 to have distributed data and node3 and node4 can have their copy with redundancy 2?

score 3 · Accepted Answer · answered Oct 26 '17 at 06:09

3

Thanks for asking - I see the documentation of global=true is a bit confusing.

In your case, you want to shard, i.e. distribute 2 replicas of each document over 4 nodes (correct me if I am wrong).

global is normally used for parent documents like in http://docs.vespa.ai/documentation/search-definitions.html#document-references - in your case you have only document type (I assume), hence no parents, so do not use global

The global feature will distribute 4 replicas over 4 nodes (if this is what you want, please set redundancy=4). but no need to use global here, too.

answered Oct 26 '17 at 06:09

Kristian Aune

876
5
5

if I change global=true to global=false in document - do we need to re-feed the index? or just redeploy is enough? – enator Oct 26 '17 at 06:12
1

I believe a redeploy will suffice, as this is just redistributing the buckets with the documents. plmk if you need to refeed. thanks! – Kristian Aune Oct 26 '17 at 06:49

How to configure shards in Vespa?

1 Answers1