I am trying to extract words from a strings column using pyspark regexp.
My DataFrame Below :
ID, Code
10, A1005*B1003
12, A1007*D1008*C1004
result=df.withColumn('Code1', regexp_extract(col(Code), '\w+',0))
Output :
ID, Code, Code1,
10, A1005*B1003, A1005
12, A1007*D1008*C1004, A1007
result=df.withColumn('Code1', regexp_extract(col(Code), '\w+',0))
Output :
ID, Code, Code1,
10, A1005*B1003, A1005
12, A1007*D1008*C1004, A1007
I want to extract codes from Code column and i want my DataFrame to display as below.
ID, Code, Code1, Code2, Code3
10, A1005*B1003, A1005, B1003, null
12, A1007*D1008*C1004, A1007, D1008, C1004