0

I am very new to Hadoop and was wondering how to search for a specified file name inside HDFS using MapReduce? Let's say that I have thousands of terabytes of imaged-map data named by their latitude/longitude coordinates. Provided a given lat/long coordinate, how would I quickly find that file using MapReduce?

I searched around and found that one way was to pipe it to grep:

hdfs dfs -ls -R / | grep [search_term]

but this would be very slow for many files of big data.

MrAlias
  • 1,316
  • 15
  • 26

1 Answers1

0

Here is my take:

  1. It is not advisable to store too many files in HDFS. Check this link: Namenode File No. Limit

  2. Search using MR is not efficient. Especially if you data is not partitioned or indexed.

  3. Your case would be best served by using a KeyValue store or a distributed search tool like Elastic Search (Given my limited understanding of your use case)
Community
  • 1
  • 1
Venkat
  • 1,810
  • 1
  • 11
  • 14