I am unable to figure out how spark is deciding on number of partitions while reading from AWS S3
My Case:
I am using Spark 1.3 (sorry, but not in my hand)
My S3 contains 100 csv files each of size ~60-75MB in batches i.e. folder1,folder2,folder3,etc contains 100 CSV files each
I'm getting partitions 295-300 while reading from this folders
I'm expecting that default partitions to be 200 always because if spark understands S3 data as a block-based system then it should read either 64MB or 128MB.
Thanks in advance.