1

I am reading tutorials on Big Data and Hadoop where I found these 2 points on HDFS

Streaming Data Access: The time to read whole data set is more important than latency in reading the first. HDFS is built on write-once and read-many- times pattern.

&

Low Latency data access: Applications that require very less time to access the first data should not use HDFS as it is giving importance to whole data rather than time to fetch the first record.

I am confused because 1st one says The time to read whole data set is more important and second one says ...should not use HDFS as it is giving importance to whole data

I don't understand what is expected? I am new to Hadoop.

minigeek
  • 2,766
  • 1
  • 25
  • 35

1 Answers1

0

Streaming data access:

HDFS is based on the principle of “write once, read many times.” The main focus is reading the complete data set in the fastest possible way is more important than taking the time to fetch a single record from the data set.

As per the Hadoop : Definitive guide

MapReduce is fundamentally a batch processing system, and is not suitable for interactive analysis. You can’t run a query and get results back in a few seconds or less. Queries typically take minutes or more, so it’s best for off-line use, where there isn’t a human sitting in the processing loop waiting for results.

MapReduce is a good fit for problems that need to analyse the whole dataset in a batch fashion. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once and read many times, whereas a relational database is good for datasets that are continually updated.

Latency : Please refer below this What is low latency access of data?

Community
  • 1
  • 1
Nireekshan
  • 317
  • 1
  • 11