let say I have data as
+-------+------+-----+---------------+--------+
|Account|nature|value| time|repeated|
+-------+------+-----+---------------+--------+
| a| 1| 50|10:05:37:293084| false |
| a| 1| 50|10:06:46:806510| false |
| a| 0| 50|11:19:42:951479| false |
| a| 1| 40|19:14:50:479055| false |
| a| 0| 50|16:56:17:251624| false |
| a| 1| 40|16:33:12:133861| false |
| a| 1| 20|17:33:01:385710| false |
| b| 0| 30|12:54:49:483725| false |
| b| 0| 40|19:23:25:845489| false |
| b| 1| 30|10:58:02:276576| false |
| b| 1| 40|12:18:27:161290| false |
| b| 0| 50|12:01:50:698592| false |
| b| 0| 50|08:45:53:894441| false |
| b| 0| 40|17:36:55:827330| false |
| b| 1| 50|17:18:41:728486| false |
+-------+------+-----+---------------+--------+
I wanted each account to be processed by single executor like account a
processed by single executor and b by different executor thus there is no parallalism for a single account.
I read about repartition(partitionExprs:column*)
and repartition(account) will partition by account so according to my data in example, will it create 2 partition and send as task to different executors.?How repartition works and send to executors?
I also looked in to `GroupByKey() for PairRDD..... what's the difference in between two of these how they are partition and forwarded to executors?