We want to setup cluster of 4 nodes to host data. And the cluster hosts one index only, so to have similar data type in all 4 node.
Our goal is to have data sharded on the nodes. Let say two shards and two replica. (total 4 nodes to host these 4 data partition)
Document mode is "index" and global is "true".
<redundancy>2</redundancy>
<nodes>
<node hostalias="node1" distribution-key="0"/>
<node hostalias="node2" distribution-key="1"/>
<node hostalias="node3" distribution-key="2"/>
<node hostalias="node4" distribution-key="3"/>
</nodes>
<engine>
<proton>
<searchable-copies>2</searchable-copies>
<flush-on-shutdown>true</flush-on-shutdown>
</proton>
</engine>
Above config of in services.xml is not allowed. it asks redundancy to be at least same number as nodes and we need to configure,
<redundancy>4</redundancy>
and
<searchable-copies>4</searchable-copies>
for it to accept a valid config.
And that is configuring all 4 nodes to have all the data and each would contain copy of data. According to http://docs.vespa.ai/documentation/content/data-placement.html - we need global=true. And noticed:
Note: The global documents feature is under development. It is currently only available for setups where all documents are already inherently on all nodes, i.e. N groups each containing a single node.
How to distribute data in shards? Can we make node1 and node2 to have distributed data and node3 and node4 can have their copy with redundancy 2?