Let first make a sample dataframe:
from pyspark.sql.functions import *
from pyspark.sql.types import *
schema = StructType([StructField("id", StringType(), True),\
StructField("credit", IntegerType(), True),\
StructField("debit", IntegerType(), True),\
StructField("sum", IntegerType(), True)])
df = spark.createDataFrame([("user_10",100, 10,110),("user_11",200, 20,220),("user_12",300, 30,330) ], schema)
df.show()
which results in:
+-------+------+-----+---+
| id|credit|debit|sum|
+-------+------+-----+---+
|user_10| 100| 10|110|
|user_11| 200| 20|220|
|user_12| 300| 30|330|
+-------+------+-----+---+
Now, lets define the udf that adds 123 to the values passed to it:
def test(x):
return(123+x)
test_udf=udf(test,IntegerType())
And lets see how to use the UDF:
df2 = df.withColumn( 'debit' , test_udf(col('debit')) )
df2.show()
which results in:
+-------+------+-----+---+
| id|credit|debit|sum|
+-------+------+-----+---+
|user_10| 100| 133|110|
|user_11| 200| 143|220|
|user_12| 300| 153|330|
+-------+------+-----+---+
Note that now you probably need to recalculate the "sum" column:
df2 = df2.withColumn( 'sum' , col('debit')+col('credit') )
df2.show()
which results in:
+-------+------+-----+---+
| id|credit|debit|sum|
+-------+------+-----+---+
|user_10| 100| 133|233|
|user_11| 200| 143|343|
|user_12| 300| 153|453|
+-------+------+-----+---+