I created sequence file from tar file full of images with tar-to-seq.jar.Now i want to create images out of bytes from that sequence file and to analyze them. Im using opencv 3.0.0 and mrjob 0.5 version.
Im having troubles to read the image using cv2.imdecode() method and im getting null value
from mrjob.job import MRJob
import os
import sys
import cv2
import numpy as np
class CountLavander(MRJob):
HADOOP_INPUT_FORMAT = 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
def mapper(self, key, value):
imgbytes = np.fromstring(value,dtype='uint8')
imarr = cv2.imdecode(imgbytes, cv2.IMREAD_COLOR)
yield imarr,1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
CountLavander.run()
As a result from running this operation:
python count_lavander.py -r hadoop --hadoop-bin /usr/bin/hadoop
--hadoop-streaming-jar /usr/hdp/2.2.8.0-3150/hadoop-mapreduce/hadoop-
streaming-2.6.0.2.2.8.0-3150.jar
--interpreter /usr/local/bin/python2.7 cor_data.seq
Im getting:
null 2731
I packed 2731 image in that sequence file so i guess that it is packed well, but somehow i cant read them as images. Anyone has some idea?