1

I have image files in HDFS and I need them to load to HBase. Can I use SPARK to get this done instead of MapReduce? If so how, please suggest. Am new to hadoop eco system.

I have created a Hbase table with MOB type with a threshold of 10MB size. Am stuck here on how to load the data using shell command line. After some research there were couple of recommendations to use MapReduce but were not informative.

Mutyam
  • 19
  • 1

1 Answers1

0

You can use Apache Tika... along with sc.binaryFiles(filesPath) formats supported by Tika are formats

out of which you need

Image formats The ImageParser class uses the standard javax.imageio feature to extract simple metadata from image formats supported by the Java platform. More complex image metadata is available through the JpegParser and TiffParser classes that uses the metadata-extractor library to supports Exif metadata extraction from Jpeg and Tiff images. and

Portable Document Format The PDFParser class parsers Portable Document Format (PDF) documents using the Apache PDFBox library.

Example code with Spark see in my answer

another example code answer given here by me to load in to hbase

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121