I am new at Apache Spark and wonder whether it is eligible for my specific scenario or not. In my case, I am crawling small datasets (as JSON files to MongoDB). Those files are actually related with the same entity but it is possible for them to have different structures (a specific JSON in the same collection may include more or less key/value pairs compared the other ones). What I am trying is to run Machine Learning (classification / regression) algorithms on those data files and derive info from them.
When you consider the case do you think Spark is eligible to speed things up by processing in parallel in a cluster environment? Or do you thing I should converge to some other alternatives?
Thank you.