I am using pyspark version 1.5.2. I have a pyspark dataframe with a column "id" as shown below:
id
------------
000001_128
000123_1_3
006745_8
000000_9_7
I want to count the number of '_' (underscores) in each row of the DF and perform a when operation such that if there is only 1 underscore in the string, I want to add '_1' as suffix, otherwise leave the value as it is. So the desired result would be :
id | new_id
------------------------
000001_128 | 000001_128_1
000123_1_3 | 000123_1_3
006745_8 | 006745_8_1
000000_9_7 | 000000_9_7
I am using pyspark.sql.functions for other operations.
Any help is appreciated!