0

I'm looking to use the where function over a dataframe using a multiindex.

My dataframe looks like this :

                                                      mw
country category     date                               
DE      Wind Onshore 2019-01-01 00:00:00+00:00  22036.50
                     2019-01-01 01:00:00+00:00  22748.25
                     2019-01-01 02:00:00+00:00  23870.25
                     2019-01-01 03:00:00+00:00  25921.50
FR      Wind Onshore 2019-01-01 00:00:00+00:00   1637.00
                     2019-01-01 01:00:00+00:00   1567.00
                     2019-01-01 02:00:00+00:00   1556.00
                     2019-01-01 03:00:00+00:00   1595.00

I'm looking for the value under a minimum (let say 90% of the maximum for this exemple) per countries (DE, FR). How to do this ?

I tried this :

maxValue = data.max(level=[index.country])
data = data.where(data < maxValue*0.1)*

It does not work since maxValue has to values and data (in the where function) is unique. (I'm not sure to be clear)

Edit

To reproduce the dataframe:

  • Row data:
  country      category                      date        mw
0      DE  Wind Onshore 2019-01-01 00:00:00+00:00  22036.50
1      DE  Wind Onshore 2019-01-01 01:00:00+00:00  22748.25
2      DE  Wind Onshore 2019-01-01 02:00:00+00:00  23870.25
3      DE  Wind Onshore 2019-01-01 03:00:00+00:00  25921.50
4      FR  Wind Onshore 2019-01-01 00:00:00+00:00   1637.00
5      FR  Wind Onshore 2019-01-01 01:00:00+00:00   1567.00
6      FR  Wind Onshore 2019-01-01 02:00:00+00:00   1556.00
7      FR  Wind Onshore 2019-01-01 03:00:00+00:00   1595.00
  • the codeline
pd.read_clipboard(sep='\s\s+').set_index(['country', 'category', 'date'])
Thomas
  • 85
  • 7
  • 1
    MultiIndex dfs cannot be pasted into pandas.read_clipboard... please paste an example as shown here https://stackoverflow.com/q/20109391/6692898 – RichieV Jul 30 '20 at 18:04

1 Answers1

0

First to get the max value. Try:

data = data.assign(max_value = data.groupby('country').transform('max'))

Now you have a row-by-row max_value. You can just:

data_filtered = data.loc[data.mw < data.max_value * 0.1]
RichieV
  • 5,103
  • 2
  • 11
  • 24
  • Not sure it's working. `data['mw'] < max_value` this line got the ValueError `cannot join with no overlapping index names` – Thomas Jul 30 '20 at 18:44
  • just edited, try now... if you post an example I can load easily I will debug the code myself – RichieV Jul 30 '20 at 19:01
  • That's absolutely perfect ! Any idea why the first one did not work ? Indexes looked similar to me. – Thomas Jul 30 '20 at 19:10