2

I'm trying to relate the groupby filtered dataframe to the original dataframe. After doing the groupby I lose some columns that I had in the original dataframe. The idea is to relate it back to their respective STATE and CITY values. But when I try to relate back the dataframe grows back to normal size with 17 rows. I just want the interface of the original dataframe to the final dataframe which contains 9 rows.

Original dataframe:

    |  COD      |STATE| CITY  |   AZIM | SET|TEC|
0   |ALAAD_0001 |AL   |MAC    |0       |1   |4  |
1   |ALAAD_0001 |AL   |MAC    |120     |2   |4  |
2   |ALAAD_0001 |AL   |MAC    |120     |2   |4  |
3   |ALAAD_0001 |AL   |MAC    |240     |3   |4  |
4   |BAPID_0001 |BA   |SAL    |20      |1   |2  |
5   |BAPID_0001 |BA   |SAL    |20      |1   |2  |
7   |BAPID_0001 |BA   |SAL    |100     |2   |2  |
8   |BAPID_0001 |BA   |SAL    |210     |3   |2  |
9   |BAPID_0001 |BA   |SAL    |250     |3   |2  |
10  |BAPID_0001 |BA   |SAL    |250     |3   |2  |
11  |CEMBC_0003 |CE   |FOR    |90      |1   |4  |
12  |CEMBC_0003 |CE   |FOR    |80      |1   |4  |
13  |CEMBC_0003 |CE   |FOR    |160     |2   |4  |
14  |CEMBC_0003 |CE   |FOR    |160     |2   |4  |
15  |CEMBC_0003 |CE   |FOR    |170     |2   |4  |
16  |CEMBC_0003 |CE   |FOR    |280     |3   |4  |

After groupby:

    | COD       |TEC     |SET |AZIM|
0   |ALAAD_0001 |4       |1   |0   |
1   |ALAAD_0001 |4       |2   |120 | 
2   |ALAAD_0001 |4       |3   |240 | 
3   |BAPID_0001 |2       |1   |20  | 
4   |BAPID_0001 |2       |2   |100 | 
5   |BAPID_0001 |2       |3   |250 |
6   |CEMBC_0003 |4       |1   |90  | 
7   |CEMBC_0003 |4       |2   |160 | 
8   |CEMBC_0003 |4       |3   |280 | 
df_cut = (
    df.groupby(["COD","TEC","SET"])
    .AZIM
    .agg(lambda x: pd.Series.mode(x).max())
    .reset_index()
)

Expected output:

    |   COD     | TEC    | SET |AZIM  | STATE | CITY |
0   |ALAAD_0001 |4       |1    |0     | AL    |MAC   |
1   |ALAAD_0001 |4       |2    |120   | AL    |MAC   |
2   |ALAAD_0001 |4       |3    |240   | AL    |MAC   |
3   |BAPID_0001 |2       |1    |20    | BA    |SAL   |
4   |BAPID_0001 |2       |2    |100   | BA    |SAL   |
5   |BAPID_0001 |2       |3    |250   | BA    |SAL   |
6   |CEMBC_0003 |4       |1    |90    | CE    |FOR   |
7   |CEMBC_0003 |4       |2    |160   | CE    |FOR   |
8   |CEMBC_0003 |4       |3    |280   | CE    |FOR   |
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
Harry Jones
  • 147
  • 6
  • Welcome to StackOverflow! For tips creating and formatting your examples, please read the guide to [creating a minimal reproducible example in pandas](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and the [how to ask](https://stackoverflow.com/help/how-to-ask) guide. – Michael Delgado Jun 03 '21 at 23:48
  • Also, are you just looking for df.drop_duplicates()? And including state and city in the original groupby alongside AZIM should work. Otherwise, using merge [in the way suggested by @Uts](https://stackoverflow.com/a/67829930/3888719) is totally the right way to go. – Michael Delgado Jun 04 '21 at 00:12
  • Man, thank you very much for correcting the formatting of my question. I don't know how to write a table on Stack overflow. Do you have some materials to guide me? – Harry Jones Jun 04 '21 at 02:23
  • For pandas dataframes just print them out and copy the result into a code block. No need for a table. – Michael Delgado Jun 04 '21 at 02:58

1 Answers1

0

We are using inner join to merge both dataframes, since original df has duplicates on merge keys so it was returning duplicate values. drop_duplicates() came in handy to solve that problem.

Code

df_cut.merge(df.drop_duplicates(), on=["COD","TEC","SET", "AZIM"])

Output

    COD        TEC  SET AZIM    STATE   CITY
0   ALAAD_0001  4   1   0       AL      MAC
1   ALAAD_0001  4   2   120     AL      MAC
2   ALAAD_0001  4   3   240     AL      MAC
3   BAPID_0001  2   1   20      BA      SAL
4   BAPID_0001  2   2   100     BA      SAL
5   BAPID_0001  2   3   250     BA      SAL
6   CEMBC_0003  4   1   90      CE      FOR
7   CEMBC_0003  4   2   160     CE      FOR
8   CEMBC_0003  4   3   280     CE      FOR
Utsav
  • 5,572
  • 2
  • 29
  • 43