1

Why is the specific file system called HDFS used in Hadoop? What is the advantage of HDFS over NTFS or FAT? What is the reason for choosing HDFS for hadoop?

Dinesh Shan
  • 81
  • 1
  • 2
  • This is probably the first thing any decent HDFS tutorial would tell you. – Tariq Jul 29 '13 at 19:48
  • Given Windows 2012R2 now has Cluster Shared Volumes available for general use on multiple cluster nodes, while it may not be able to scale to the tune of thousands, for a smaller cluster it looks like it could be a viable alternative. And the newer ReFS file system in 2012R2 may even be more suitable than NTFS. – Brain2000 Jan 23 '15 at 23:46

2 Answers2

2

... Because NTFS and FAT aren't Distributed. The advantage of HDFS is that it is.

See the HDFS Introduction.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
  • With CSV, NTFS and ReFS can now be directly shared between multiple systems. – Brain2000 Jan 23 '15 at 23:50
  • @Brain2000 Yes, although I'd argue that HDFS is *intrinsically* clusterable, by design. Anything can be clustered with enough layers: HDFS is clusterable on purpose. – Dave Newton Jan 24 '15 at 01:47
  • True. I'm looking at Hadoop for the first time. I feel like it's a straight jacket, targeting only big data warehousing, leaving medium sized projects out of luck. If Hadoop could be used on NTFS with a smaller stripe size, I feel it could be useful for medium sized projects. – Brain2000 Jan 24 '15 at 05:25
  • @Brain2000 In general "medium" (not sure what that means here) don't really need it-you're right, it's specifically for big stuff. – Dave Newton Jan 24 '15 at 11:37
  • Link is dead. Please cite relevant parts to avoid this. – Stefan Jun 27 '18 at 10:28
  • 1
    @Stefan That *is* the relevant part. – Dave Newton Jun 27 '18 at 11:52
  • Ok, I see. The purpose of the link was not clear to me. – Stefan Jun 27 '18 at 13:08
  • @Stefan Because the OP, I guess, was unable to look it up for themselves. Please note that I updated the link, and rolled it back. – Dave Newton Jun 27 '18 at 13:13
2

I was wondering if HDFS is

a) a direct alternative to file systems like ntfs and ext4 ("Do I have to format a hard drive to set up a HDFS node and will loose all existing data?")

b) installed on top of an underlying file system.

and found this SO question while searching for an answer.

Well, it's b)

HDFS is not an actual filesystem but it uses API access to the underlying filesystem.Yahoo uses ext3 as a base filesystem for hadoop deployments.

Related questions and articles:

Stefan
  • 10,010
  • 7
  • 61
  • 117