How to get distinct words of a column based on group by of another column
I need to get distinct colB words for each colA value
my dataframe:
colA colB
US California City
US San Jose ABC
UK London 123
US California ZZZ
UK Manchester
UK London
Reqd dataframe (df):
col A colB
US California
US City
US ABC
US ZZZ
US San
US Jose
UK London
UK 123
UK Manchester
EDIT:
Thanks to @jezrael, I was able to get the desired dataframe
I have another dataframe (df2)
ColC ColA ColB
C1 US California
C1 US ABC
C2 UK LONDON
For each value of column (colC), i need the intersection of colB strings with the previously obtained dataframe.
Required:
ColC n(df2_colBuniq) n(df_df2_intersec_colB)
C1 2 2
C2 1 1
I tried looping through each unique colC value, but for the large data frame I have, it is taking quite some time. Any suggestions?