I have a spark dataframe which contains the content of a json file. I need to create a new column which is populated conditionally based on the content of another column.
Let's say I have a column containing some numbers and my new column will be populated depending on the value of this numbers (eg: first column has a number which is lower than 5, my new column will populated with the string 'lower than five', if the value was greater that 5 the new column would be populated with 'greater than five).
I know that I can do something like this with the when function:
file.withColumn('newcolumn', \
F.when(file.oldColumn < 5, 'Lower than five') \
.when(file.oldColumn > 5, 'Greater than five').show()
but what if 'oldColumn' does not have just integers but it contains string from which I need to extract the integer:
eg 'PT5M' and I need to extract the 5 and I need to consider a string like 'PTM' which does not contain a number as 0
So far I manage to extract the number for my first column using regexp_extract but I am struggling with turning the null values into 0
example where 1 is the original column and 2 is the new column:
+-------+-------------------+
|1 | 2 |
+-------+-------------------+
|PT5M | Lower than five |
|PT10M | Greater than five|
|PT11M | Greater than five|
+-------+-------------------+
Thanks for your help!