14

I am new to Cassandra and I want to install it. So far I've read a small article on it.

But there one thing that I do not understand and it is the meaning of 'node'.

Can anyone tell me what a 'node' is, what it is for, and how many nodes we can have in one cluster ?

pjz
  • 41,842
  • 6
  • 48
  • 60
ING
  • 219
  • 1
  • 3
  • 8
  • 2
    Somewhat related: http://stackoverflow.com/questions/28196440/what-are-the-differences-between-a-node-a-cluster-and-a-datacenter-in-a-cassand – Aaron Feb 11 '15 at 15:02

4 Answers4

15

A node is the storage layer within a server.

Newer versions of Cassandra use virtual nodes, or vnodes. There are 256 vnodes per server by default.

A vnode is essentially the storage layer.

  • machine: a physical server, EC2 instance, etc.
  • server: an installation of Cassandra. Each machine has one installation of Cassandra. The Cassandra server runs core processes such as the snitch, the partitioner, etc.
  • vnode: The storage layer in a Cassandra server. There are 256 vnodes per server by default.

Helpful tip:

Where you will get confused is that Cassandra terminology (in older blog posts, YouTube videos, and so on) had been used inconsistently. In older versions of Cassandra, each machine had one Cassandra server installed, and each server contained one node. Due to the 1-to-1-to-1 relationship between machine-server-node in old versions of Cassandra people previously used the terms machine, server and node interchangeably.

Akbar Ahmed
  • 1,412
  • 10
  • 9
7

Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. Like all other distributed database systems, it provides high availability with no single point of failure.

You may got some ideas from the description of above paragraph. Generally, when we talk Cassandra, we mean a Cassandra cluster, not a single PC. A node in a cluster is just a fully functional machine that is connected with other nodes in the cluster through high internal network. All nodes work together to make sure that even if one of them failed due to unexpected error, they as a whole cluster can provide service.

All nodes in a Cassandra cluster are same. There is no concept of Master node or slave nodes. There are multiple reason to design like this, and you can Google it for more details if you want.

Theoretically, you can have as many nodes as you want in a Cassandra cluster. For example, Apple used 75,000 nodes served Cassandra summit in 2014.

Of course you can try Cassandra with one machine. It still work while just one node in this cluster.

Chong Tang
  • 2,066
  • 15
  • 12
  • Where did you hear about the 75K node cluster? Is anyone keeping a list of the biggest clusters? – Don Branson Feb 11 '15 at 19:38
  • 1
    I don't know if there is such a list. I am a Ph.D. student right now, and I keep my eyes on tech news. This is I got from a news last years. I was so astonished by this number that I remember now. I just googled it, and there are some news about it. For example, http://opensourceconnections.com/blog/2014/09/17/cassandra-summit-2014/ – Chong Tang Feb 11 '15 at 19:46
  • Cool, thanks. They have multiple clusters, their largest being 1,000 nodes. I wonder what the biggest single cluster is so far. – Don Branson Feb 11 '15 at 19:49
  • Yes. They have multiple clusters. I don't really know the number of a single cluster. There are always trade-offs between size of a cluster and efficiency, mainly due to the way to connect them together. How to inter-connect nodes in a cluster is the main research interesting of distributed system research community. – Chong Tang Feb 11 '15 at 19:54
2

What is meant by a node in cassandra?

Cassandra Node is a place where data is stored.

Data center is a collection of related nodes.

A cluster is a component which contains one or more data centers. In other words collection of multiple Cassandra nodes which communicates with each other to perform set of operation.

  • In Cassandra, each node is independent and at the same time interconnected to other nodes.
  • All the nodes in a cluster play the same role.
  • Every node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
  • In the case of failure of one node, Read/Write requests can be served from other nodes in the network. enter image description here
-1

If you're looking to understand Cassandra terminology, then the following post is a good reference:

http://exponential.io/blog/2015/01/08/cassandra-terminology/

Akbar Ahmed
  • 1,412
  • 10
  • 9