How to create band column based on values pyspark

Question

I'm newbie into PySpark and I have the following task where I struggle. I have tried few approached, but none of them worked properly. The data is as follows:

id|numb_of_count|
1|3|
2|5|
3|6|
4|2|
5|0|
6|15|
7|8|
8|99|

I want to achieve the following result:

id|numb_of_count|banding|
1|3|3-5|
2|5|3-5| 
3|6|6-10|
4|2|2|
5|0|0|
6|15|+11|
7|8|6-10|
8|99|+11|

How this could be possible achieved in the most efficient way, due to I have a large dataset?

Seems like you want a [series of `if`/`else`](https://stackoverflow.com/a/39048475/5858851) statements. — pault, Jul 09 '18 at 14:15
You'll have to fill in the logic for the conditions yourself, but you need something like `df.withColumn('banding', when(col('numb_of_count') == 0, "0").when(condition).when(condition).otherwise("+11"))` — pault, Jul 09 '18 at 15:11

Rahul Chawla · Accepted Answer · 2018-07-09T15:25:33.703

In pyspark when/otherwise are equivalent of if/else. If df is your actual dataframe then:

new_df = df.withColumn('banding', when(col('numb_of_count') <3,col('numb_of_count')).when(col('numb_of_count') <=5 , '3-5').when(col('numb_of_count') <= 10, '6-10').otherwise('+11'))

df.withColumn

df.withColumn adds a new column to the frame with first argument as name of new column. more info here

when/otherwise

analogous to if/else, more info here

This is an excellent answer to learn more about when/otherwise.

How to create band column based on values pyspark

1 Answers1

df.withColumn

when/otherwise