2

I have a multiindex dataframe like this:

                            Distance
Company Driver Document_id          
Salt    Fred   1               592.0
               2               550.0
        John   3               961.0
               4               346.0
Bricks  James  10              244.0
               20              303.0
               30              811.0
        Fred   40              449.0
        James  501             265.0
Sand    Donald 15              378.0
               800             359.0

How can I slice that df to see only drivers, who worked for different companies? So the result should be like this:

                            Distance
Company Driver Document_id    
Salt    Fred   1               592.0
               2               550.0
Bricks  Fred   40              449.0

UPD: My original dataframe is 400k long, so I can't just slice it by index. I'm trying to find general solution to solve problems like these.

cs95
  • 379,657
  • 97
  • 704
  • 746

1 Answers1

3

To get the number of unique companies a person has worked for, use groupby and unique:

v = (df.index.get_level_values(0)
       .to_series()
       .groupby(df.index.get_level_values(1))
       .nunique())   
# Alternative involving resetting the index, may not be as efficient.
# v = df.reset_index().groupby('Driver').Company.nunique()
v

Driver
Donald    1
Fred      2
James     1
John      1
Name: Company, dtype: int64

Now, you can run a query:

names = v[v.gt(1)].index.tolist()
df.query("Driver in @names")

                            Distance
Company Driver Document_id          
Salt    Fred   1               592.0
               2               550.0
Bricks  Fred   40              449.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 2
    @AndreyGoloborodko \m/ and `query` is a very powerful API I am trying to promote. If you are interested to learn more, I invite you to go through [this post](https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval) I wrote recently regarding `query` and `eval`. – cs95 Dec 16 '18 at 17:04