From my understanding, udf parameters are column names. Your example might be rewrote like this:
from pyspark.sql.functions import udf, array
from pyspark.sql.types import IntegerType
def change_val_x(val_x):
threshold = 10
if val_x > threshold:
return another_function(val_x)
else:
return val_x
def change_val_y(arr):
threshold = 10
# arr[0] -> val_x, arr[0] -> val_y
if arr[0] > threshold:
return another_function(arr[1])
else:
return val_y
change_val_x_udf = udf(change_val_x, IntegerType())
change_val_y_udf = udf(change_val_y, IntegerType())
# apply these functions to your dataframe
df = df.withColumn('val_y', change_val_y_udf(array('val_x', 'val_y')))\
.withColumn('val_x', change_val_x_udf('val_x'))
To modify val_x column, a simple udf is enough but for val_y you need val_y and val_x columns values, the solution is to use an array
. Note that this code is not tested...
See this question to apply udf on multiple columns.