The method is based on the comments by @noslenkwah and @skrubber.
If you need most common row for each name as described
df = pd.DataFrame({'Name': ['John', 'John', 'John', 'John', 'John', 'Sam', 'Sam', 'Sam', 'Sam'] , 'A': [1, 1, 1, 2, 2, 3, 3, 3, 3] ,
'B': [1, 1, 2, 2, 2, 3, 4, 3, 3] , 'C': [2, 1, 1, 1, 1, 3, 3, 3, 3]})
df_count = df.groupby(["A", "B", "C"])['Name'].value_counts().reset_index(name="Count")
l = list()
for name in set(df_count.Name):
freq = df_count[df_count.Name == name].Count.max()
l.append(df_count[(df_count.Name == name) & (df_count.Count == freq)].values.tolist()[0])
print(l)
[[3, 3, 3, 'Sam', 3], [2, 2, 1, 'John', 2]]
First the method group df
by ["A", "B", "C"]
. Then loop over names, and first calculate the most common rows' frequency within this name. Then filter out groups that have different frequncy and with different names in the query df_count[(df_count.Name == name) & (df_count.Count == freq)]
.
A more general problem: finding rows with highest frequency regardless names, you can try
df_count = df.groupby(["A", "B", "C"])['Name'].value_counts().reset_index(name="Count")
max_freq = df_count["Count"].max()
df_count[df_count["Count"] == max_freq]
A B C Name Count
3 2 2 1 John 2
4 3 3 3 Sam 2
Reference
Pandas: Selecting rows based on value counts of a particular column
Selecting with complex criteria from pandas.DataFrame