I'm trying in vain to use a Pyspark substring function inside of an UDF. Below is my code snippet -
from pyspark.sql.functions import substring
def my_udf(my_str):
try:
my_sub_str = substring(my_str,1, 2)
except Exception:
pass
else:
return (my_sub_str)
apply_my_udf = udf(my_udf)
df = input_data.withColumn("sub_str", apply_my_udf(input_data.col0))
The sample data is-
ABC1234
DEF2345
GHI3456
But when I print the df, I don't get any value in the new column "sub_str" as shown below -
[Row(col0='ABC1234', sub_str=None), Row(col0='DEF2345', sub_str=None), Row(col0='GHI3456', sub_str=None)]
Can anyone please let me know what I'm doing wrong?