2

As i am new to elastic search , i have confusion in terms of elastic search.

 1) shard
 2) cluster
 3) difference between number_of_nodes and number_of_data_nodes
 4) difference active_primary_shards  and active_shards
 5) relocating_shards 
 6) unassigned_shards 

while starting elasticsearch it showing as 5 shards.But after inserting data shards keep increasing.I don't know what is default setting for shards.

thanks in advance!

javanna
  • 59,145
  • 14
  • 144
  • 125
  • 1
    possible dupe. http://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch – shyos Jan 22 '14 at 08:39
  • I read that post already. now my doubt is shards are dependent on index or cluster.and i created 5 index it keeps increasing shards and my cluster is in red status –  Jan 23 '14 at 04:56
  • In the post they explained about shard relocation. what it means each nodes ve separate data or separate index. –  Jan 23 '14 at 04:59

1 Answers1

4
  • Are your shards increasing by 10 every time you attempt to insert a new document?
  • Are you sure you are inserting the document into the index that you have created prior to attempting to insert a document? T

he reason I ask if the shards are increasing by 10 when you try to insert a document is as, then you would be instead of inserting into the index you created already you would be creating a whole new index based on ElasticSearch' out of the box defaults. Which is 5 Shards and 1 Replica, meaning 10 shards (1 set of replica to replicate the primary shards).

For example, here is how you can create an index and be in total control over how many shards are created -

curl -XPUT <host>:<port>/<index> -d '{
"settings": {
        "number_of_shards": 2,
        "number_of_replicas": 0,
        "analysis": {
            "analyzer": {
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
             },
             "filter": {
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
              ..........
             }
        }
    }
}'
  1. A shard in ElasticSearch is essentially a Lucene instance.
  2. A cluster in ElasticSearch is essentially one or more nodes which contain the same cluster name, allowing them to join together to spread shards in a distributed nature.
  3. Number of data nodes is essentially the number of nodes which aren't just coordinator nodes, i.e. nodes which will receive search requests, potentially be a master node and nodes which will hold and contain data (shards). Coordinator nodes will only distribute requests to the data nodes.
  4. Active primary shards are those which are happily available on a node within a cluster and active shards will include replicas in that count.
  5. Relocating shards are those which may be migrating to a different node within the same cluster, this is due to a new node joining the cluster and as ElasticSearch is distributed it will shift the shards to rebalance the cluster, ensuring availability.
  6. Shards which are not assigned to a node.
Nathan Smith
  • 8,271
  • 3
  • 27
  • 44
  • if i create a new node means it says relocating shards. where it ll relocate.if it relocates to new node then ,do i need to search data in data present in relocated shard in new node r can search in old node? –  Jan 23 '14 at 12:07
  • @user3202550 If I'm answering your question which I'm not too sure if I am or not as it's rather unclear. Imagine this, you have two nodes, an index with 5 shards and 1 replica which = 10 shards, ElasticSearch will place shard #0 and replica #0 on different nodes so that if a node goes down all data is still available. ElasticSearch will always direct your requests to the necessary shards. Hope this answers your question. – Nathan Smith Jan 23 '14 at 13:27