How to read images from Hadoop sequence file using opencv and MrJob?

Question

I created sequence file from tar file full of images with tar-to-seq.jar.Now i want to create images out of bytes from that sequence file and to analyze them. Im using opencv 3.0.0 and mrjob 0.5 version.

Im having troubles to read the image using cv2.imdecode() method and im getting null value

from mrjob.job import MRJob
import os
import sys
import cv2
import numpy as np
class CountLavander(MRJob):
    HADOOP_INPUT_FORMAT = 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
    def mapper(self, key, value):
        imgbytes = np.fromstring(value,dtype='uint8')
        imarr = cv2.imdecode(imgbytes, cv2.IMREAD_COLOR)
        yield imarr,1




    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    CountLavander.run()

As a result from running this operation:

    python count_lavander.py -r hadoop --hadoop-bin /usr/bin/hadoop
     --hadoop-streaming-jar /usr/hdp/2.2.8.0-3150/hadoop-mapreduce/hadoop-
streaming-2.6.0.2.2.8.0-3150.jar 
    --interpreter /usr/local/bin/python2.7 cor_data.seq

Im getting:

 null   2731

I packed 2731 image in that sequence file so i guess that it is packed well, but somehow i cant read them as images. Anyone has some idea?

Hello @Milos, I don't have answer to your question. But it seems you have already worked with opencv and hadoop. I just started and getting error of opencv native library in hadoop. Can you please help regarding this. I am really stuck with this problem. this is my question : http://stackoverflow.com/questions/36270351/opencv-library-loaded-in-hadoop-but-not-working — Gurinderbeer Singh, Apr 04 '16 at 14:51
Hi Milos i got stuck at the same thing, did you work out any solution? Would be glad to hear anything, thank you. — Josyula Krishna, Apr 13 '16 at 05:00

How to read images from Hadoop sequence file using opencv and MrJob?

0 Answers0