I am trying to understand RDD partitioning logic. RDD is partitioned across nodes but wants to understand how this partitioning logic works.
I have VM with 4 cores assigned to it. I created two RDD , one from HDFS and one from parallelize operation.
First time two partition got created but in second operation 4 partition got created.
I checked no of blocks allocated to file - it was 1 block as file is very small but when I created RDD on that file , it shows two partitions. Why is this ? I read somewhere that partitioning also depends on no of core which 4 in my case which still does not satisfies that output.
Can someone help to understand this?