I have a UDF as below which is a normal scalar Pyspark UDF :
@udf()
def redact(colVal: column, offset: int = 0):
if not colVal or not offset:
return 'X'*8
else:
charList=list(colVal)
charList[:-offset]='X'*(len(colVal)-offset)
return "".join(charList)
While I try to convert this to pandas_udf as I read there is drastic performance inprovements in using vectorized UDF's in place of scalar UDF's, Im getting lot of issues related to pandas which Im less experienced in.
Please help me in converting this UDF to a Vectorized Pandas UDF