0

I have a dataframe that I grouped with rows that are the same to each other and making a list of the values accordingly.

    Company                    Who           Dates
0   DE BORTOLI WINES          DIXONS CREEK  1/02/2020
1   DE BORTOLI WINES          DIXONS GREEK  1/02/2020
2   DE BORTOLI WINES          DIXONS CREEK  1/03/2020
3   DE BORTOLI WINES          BILBUL  1/05/2020
4   Ezard@Levantine Hill      Coldstream  1/06/2020
5   Ezard@LevantineHill       Hotstream  1/10/2020
6   RATHBONE WINE GROUP       PORT MELBOURN  1/02/2020
7   YERING STATION            YARRA GLEN  1/05/2020
8   YERING STATION            YARRA GREEN  1/01/2020

By doing this:

sorted_ = df["Dates"].groupby(df["Company"].ne(df["Company"].shift()).cumsum()).apply(list)

I can get a list of lists dates that are on the same company.

Something like this

and if I do this

sorted_ = df["Who"].groupby(df["Company"].ne(df["Company"].shift()).cumsum()).apply(list)

I can get a list of lists Who's that are on the same company.

so something like

[DIXONS CREEK, DIXONS GREEK, DIXONS CREEK, BILBUL]
[Coldstream, Hotstream]
[PORT MELBOURN]
[YARRA GLEN, YARRA GREEN]

The problem is, in a very large dataset I dont really know which Company they belong to. How can I see which Company they are grouped by?

Ideal result:

  Company               Result
  DE BORTOLI WINES      [DIXONS CREEK, DIXONS GREEK, DIXONS CREEK, BILBUL]
  Ezard@Levantine Hill  [Coldstream, Hotstream]
  RATHBONE WINE GROUP   [PORT MELBOURN]
  YERING STATION        [YARRA GLEN, YARRA GREEN]
DSC
  • 1
  • 1
  • `df.groupby('Company')['Who'].agg(list)` – Erfan Jul 28 '20 at 17:57
  • Basically the technique you are looking for is `Group by`, would be good for your learning to dive into that. – Erfan Jul 28 '20 at 17:58
  • @Erfan works like a charm, is there any way to do this for two columns? so I can include the dates as well? – DSC Jul 28 '20 at 17:59
  • @Erfan I figured it out, thanks, you can put this as an answer and I will accept it :) – DSC Jul 28 '20 at 18:44

0 Answers0