First select only columns for output and add drop_duplicates
, last add new column by range
:
df = df[['age','maritalstatus']].drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 Young married 0
1 young married 1
2 young Single 2
3 old single 3
4 old married 4
5 teen married 5
7 adult single 6
If want convert all values to lowercase first:
df = df[['age','maritalstatus']].apply(lambda x: x.str.lower()).drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 young married 0
2 young single 1
3 old single 2
4 old married 3
5 teen married 4
7 adult single 5
EDIT:
First convert to lowercase
:
df[['age','maritalstatus']] = df[['age','maritalstatus']].apply(lambda x: x.str.lower())
print (df)
user age maritalstatus product
0 A young married 111
1 B young married 222
2 C young single 111
3 D old single 222
4 E old married 111
5 F teen married 222
6 G teen married 555
7 H adult single 444
8 I adult single 333
And then use merge
for unique product
converted to list
:
df2 = pd.DataFrame([{'user':'X', 'age':'young', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 young married X
a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[111, 222]
df2 = pd.DataFrame([{'user':'X', 'age':'adult', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 adult married X
a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[]
But if need column use transform
:
df['prod'] = df.groupby(['age', 'maritalstatus'])['product'].transform('unique')
print (df)
user age maritalstatus product prod
0 A young married 111 [111, 222]
1 B young married 222 [111, 222]
2 C young single 111 [111]
3 D old single 222 [222]
4 E old married 111 [111]
5 F teen married 222 [222, 555]
6 G teen married 555 [222, 555]
7 H adult single 444 [444, 333]
8 I adult single 333 [444, 333]
EDIT1:
a = (pd.merge(df, df2, on=['age','maritalstatus'])
.groupby('user_y')['product']
.apply(lambda x: x.unique().tolist())
.to_dict())
print (a)
{'X': [111, 222]}
Detail:
print (pd.merge(df, df2, on=['age','maritalstatus']))
user_x age maritalstatus product user_y
0 A young married 111 X
1 B young married 222 X