I have a dataframe with two columns district and region. Below is the sample of how the input dataframe looks like:
district region
1 Aveiro -
2 Aveiro Entre Douro e Minho
3 Aveiro Beira Litoral
4 Aveiro Beira Litoral
5 Aveiro Entre Douro e Minho
6 Aveiro Beira Litoral
7 Braga Trás-os-Montes
8 Braga -
9 Braga Trás-os-Montes
As you can see, There are no null values in the dataframes. But in the region column, there are some records that have this value "-" . Now i want to replcae all the "-" records in that column with the most frequent value based on a groupby scenario with column district: We can get that count with this...
df1['region'].groupby(df1['district']).value_counts()
district region
Aveiro Beira Litoral 3
Entre Douro e Minho 2
- 1
Braga Trás-os-Montes 2
- 1
As you can see, "Beira Litoral" is the most frequent value for Averio, then it should replace the "-" in region column. Similarly, "Trás-os-Montes" is the most frequent value for Braga.
The output dataframe should look like this:
district region
1 Aveiro Beira Litoral
2 Aveiro Entre Douro e Minho
3 Aveiro Beira Litoral
4 Aveiro Beira Litoral
5 Aveiro Entre Douro e Minho
6 Aveiro Beira Litoral
7 Braga Trás-os-Montes
8 Braga Trás-os-Montes
9 Braga Trás-os-Montes
If i had Nan instead of "-" then I could have solved that with something like this