I'm trying image processing on Apache Spark
and I'm using Python
language
RDD dataset is made image data in local disks
so, I did this :
sparkConf = SparkConf().setAppName("classification").setMaster("local")
sc = SparkContext(conf = sparkConf)
ImageRDD = sc.wholeTextFiles("PATH/*.jpg")
and ImageRDD may be distributed in several nodes.
from now on, I don't know how to operate imageRDD data with OpenCV.
My plan is that to take each value(image) in imageRDD and use imread(OpenCV function) to convert OpenCV image data.
OpenCV image data features will be extracted, and these features will be new Rdd dataset.