I am trying to convert this udf into this pandas udf, in order to avoid creating two pandas udfs.
Convert this:
@udf("string")
def splitEmailUDF(email: str, position: int) -> str:
return email.split("@")[position]
into this in one pandas udf --- position ??? Datatype or something else!
from pyspark.sql.functions import pandas_udf, PandasUDFType
@pandas_udf("string")
def splitEmailUDFVec(email: pd.Series, position: ???????) -> pd.Series:
return email.str.split("@").str[position]
Of course I can always create two pandas_udfs:
from pyspark.sql.functions import pandas_udf
@pandas_udf("string")
def splitFirstNameUDFVec(email: pd.Series) -> pd.Series:
return email.str.split("@").str[0]
@pandas_udf("string")
def splitDomainUDFVec(email: pd.Series) -> pd.Series:
return email.str.split("@").str[1]
Any help will be appreciated!