4

We want to setup cluster of 4 nodes to host data. And the cluster hosts one index only, so to have similar data type in all 4 node.

Our goal is to have data sharded on the nodes. Let say two shards and two replica. (total 4 nodes to host these 4 data partition)

Document mode is "index" and global is "true".

   <redundancy>2</redundancy>

   <nodes>
      <node hostalias="node1" distribution-key="0"/>
      <node hostalias="node2" distribution-key="1"/>
      <node hostalias="node3" distribution-key="2"/>
      <node hostalias="node4" distribution-key="3"/>
    </nodes>        

    <engine>
      <proton>
        <searchable-copies>2</searchable-copies>
        <flush-on-shutdown>true</flush-on-shutdown>
      </proton>
    </engine>

Above config of in services.xml is not allowed. it asks redundancy to be at least same number as nodes and we need to configure,

<redundancy>4</redundancy>

and

<searchable-copies>4</searchable-copies>

for it to accept a valid config.

And that is configuring all 4 nodes to have all the data and each would contain copy of data. According to http://docs.vespa.ai/documentation/content/data-placement.html - we need global=true. And noticed:

Note: The global documents feature is under development. It is currently only available for setups where all documents are already inherently on all nodes, i.e. N groups each containing a single node.

How to distribute data in shards? Can we make node1 and node2 to have distributed data and node3 and node4 can have their copy with redundancy 2?

Kirk Beard
  • 9,569
  • 12
  • 43
  • 47
enator
  • 2,431
  • 2
  • 28
  • 46

1 Answers1

3

Thanks for asking - I see the documentation of global=true is a bit confusing.

In your case, you want to shard, i.e. distribute 2 replicas of each document over 4 nodes (correct me if I am wrong).

global is normally used for parent documents like in http://docs.vespa.ai/documentation/search-definitions.html#document-references - in your case you have only document type (I assume), hence no parents, so do not use global

The global feature will distribute 4 replicas over 4 nodes (if this is what you want, please set redundancy=4). but no need to use global here, too.

Kristian Aune
  • 876
  • 5
  • 5
  • if I change global=true to global=false in document - do we need to re-feed the index? or just redeploy is enough? – enator Oct 26 '17 at 06:12
  • 1
    I believe a redeploy will suffice, as this is just redistributing the buckets with the documents. plmk if you need to refeed. thanks! – Kristian Aune Oct 26 '17 at 06:49