I've already used different answers but not any of them solved my problem. I also looked at this answer. but it didn't work either. Here is my dataframe:
import numpy as np
import pandas as pd
np.random.seed(2)
col1 = np.random.choice([1,2,3], size=(50))
col2 = np.random.choice([1,2,3,4], size=(50))
col3 = np.random.choice(['a', 'b', 'c', 'd', 'e'], size=(50))
data = {'col1':col1, 'col2':col2, 'col3':col3}
df = pd.DataFrame(data)
I want to
1) perform a groupby
on c1
and c2
columns and
2) create a new column that is the most frequent value on c3
column.
The final df should look like this:
c1 c2 c3 c4
0 1 1 b b
1 1 1 b b
2 1 2 a b
3 1 2 b b
4 1 2 b b
5 1 2 b b
6 1 2 c b
7 1 3 a a
8 1 3 c a
9 1 3 b a
10 1 3 c a
11 1 3 a a
12 1 3 b a
13 1 3 a a
14 1 3 a a
15 1 3 c a
16 1 4 a a
17 2 1 c c
18 2 1 c c
19 2 1 a c
20 2 1 c c
21 2 1 c c
22 2 1 b c
23 2 2 a a
24 2 2 c a
25 2 2 a a
26 2 3 a a
27 2 3 a a
28 2 4 c c
29 2 4 c c
30 3 1 b a
31 3 1 a a
32 3 1 a a
33 3 1 c a
34 3 1 b a
35 3 2 c c
36 3 2 c c
37 3 2 b c
38 3 2 a c
39 3 2 c c
40 3 3 b b
41 3 3 a b
42 3 3 b b
43 3 3 c b
44 3 3 a b
45 3 3 b b
46 3 3 b b
47 3 3 c b
48 3 4 b b
49 3 4 c c
For example I used this code without any success:
df1 = df.groupby(['c1', 'c2'])['c3'].agg(lambda x:x.value_counts().index[0])