I'm newbie into PySpark and I have the following task where I struggle. I have tried few approached, but none of them worked properly. The data is as follows:
id|numb_of_count|
1|3|
2|5|
3|6|
4|2|
5|0|
6|15|
7|8|
8|99|
I want to achieve the following result:
id|numb_of_count|banding|
1|3|3-5|
2|5|3-5|
3|6|6-10|
4|2|2|
5|0|0|
6|15|+11|
7|8|6-10|
8|99|+11|
How this could be possible achieved in the most efficient way, due to I have a large dataset?