I have a set of large xml files, zipped together in a singe file and many such zip files. I was using Mapreduce earlier to parse the xml using custom inputformat and recordreader setting the splittable=false and reading the zip and xml file.
I am new to Spark. Can someone help me how can I prevent spark from splitting the zip file and process multiple zips in parallel as I am able to do in MR.