Big data is a concept that deals with data sets of extreme volumes. Questions may tend to be related to infrastructure, algorithms, statistics, and data structures.
Big data is a concept that deals with data sets of extreme volumes. Big Data is not only data with a huge volume, there are many other characteristics such as velocity, veracity, and variety.
There are several features that allow separating this concept into a distinct one:
Data
- Data is so large it cannot be processed on a single computer.
- Relationship between data elements is extremely complex.
Algorithms
- Local algorithms that take longer than O(N) to compute will likely take many years to finish.
- Fast distributed algorithms are used instead.
Storage
- Underlying data storage shall be fault-tolerant and keep data in a consistent state independently of device failures.
- One storage device is incapable of holding all the data set.
Eco-system
- Big data is also synonymous with the set of tools which are used to process huge amounts of data. This is also known as big data eco-system. Popular tools are HDFS, Spark, MapReduce, etc.