I have a dataset with some categorical features. I am trying to apply exact same function on all of these categorical features in Spark framework. My first assumption was that I can parallelize operation of each feature with operation of other features. However I couldn't figure out is it possible or not (confused after reading this, this).
For example, assume that my dataset is as following:
feature1, feature2, feature3
blue,apple,snake
orange,orange,monkey
blue,orange,horse
I want to count the number of occurrences of each category for each feature, separately. For example for feature1 (blue=2, orange=1)