I have a PySpark dataframe like this:
cust_id prod
1 A
1 B
1 C
2 D
2 E
2 F
Desired Output:
cust_id prod
1 A/B/C
2 D/E/F
Now using Pandas I am able to do it like below:
T=df.groupby(['cust_id'])['prod'].apply(lambda x:np.hstack(x)).reset_index()
def func_x(ls):
n=len(ls)
s=''
for i in range(n):
if n-i==1:
s=s+ls[i]
else:
s=s+ls[i]+'/'
return s
T['prod1']=T['prod'].apply(lambda x:func_x(x))
What will be this code's equivalent in PySpark?