0

I have a Pandas dataframe listing threshold values for various systems:

      METRIC  SYSTEM_NAME  YELLOW      RED
16    pagins          NaN   500.0   1000.0
17  preadsec          NaN  5000.0  10000.0
18   swapins          NaN   250.0    500.0
19  cpupcent          foo   30.0     90.0
20    pagins          bar   456.0    123.0

I want to merge this df into another containing samples for these metrics. The thresholds are contained within the configuration as a set of default values with per-system overrides, as for rows 16 and 20 above.

I'd like this to be reflected in the resulting join - if there is an override, it should take precedence over the default - however, I can only see this working by performing 2 merge ops - one for the default, another for the overrides - then a third to produce the final table.

I believe in SQL i could do this using an OR clause, but I can't find a way to do so in Pandas.

Does such a thing exist?

Edit: For clarity, the other DF has this structure:

              SYSTEM_NAME    METRIC        CVAL
19886                 foo  cpupcent   89.281734
19887                 bar   swapins   41.799927
19888                 bar    pagins   123.92355
19889                quux  preadsec   28.837423
19890                quux    pagins   232.30303

So in the hypothetical merge, the resulting output would look like this:

              SYSTEM_NAME    METRIC        CVAL   YELLOW      RED
19886                 foo  cpupcent   89.281734     30.0     90.0
19887                 bar   swapins   41.799927    250.0    500.0
19888                 bar    pagins   123.92355    456.0    123.0
19889                quux  preadsec   28.837423   5000.0  10000.0 
19890                quux    pagins   232.30303    500.0   1000.0

Here, bar's pagins has been overriden, while quux has taken the default value for YELLOW and RED. Since the override is only for pagins, bar's swapins is the default.

lgg
  • 83
  • 7

1 Answers1

0

From continued mucking around, I found this. It's somewhat convoluted and messy, but it solves the problem in a reasonable time frame.

It assumes that the values are weighted by an additional column, with lowest value taking preference.

# Ground work, prepare the index
tmp_df = df.reset_index()
# Now, perform the merge. Use the common value, then tidy up the duplicates
tmp_df = tmp_df.merge(t_df, 'left', on='METRIC')\
         .drop('SYSTEM_NAME_y', axis=1)
         .rename(index=str, columns='SYSTEM_NAME_x':'SYSTEM_NAME'})
         .drop_duplicates(subset=['END_DATE','METRIC','SYSTEM_NAME'], keep='last')
# And restore the index
tmp_df = tmp_df.set_index(df.index.name)
lgg
  • 83
  • 7