I have 2 DFs that have different sizes, something like:
DF1:
Metric Market BSegment MSegment Category Num Den Period
KPI1 GA PS Comm BF 20 100 202201
KPI1 CE PR Cons BF 70 128 202201
KPI2 NWE PR Cons On 12 389 202203
.
.
.
KPI20 ISE PS Cons On 80 288 202204
KPI30 GA PS Comm BF 32 85 202204
DF2:
Metric Market BSegment MSegment Category StartPeriod EndPeriod Goal
KPI1 * PS Comm BF 202201 202205 5
KPI1 * PR Cons BF 202201 202205 10
KPI2 NWE PR * On 202112 202203 60
.
.
.
KPI2 ISE PS * On 202112 202203 40
KPI3 GA PS Comm BF 202201 202206 22
KPI3 CE PS Comm BF 202201 202206 26
KPI3 NWE PS Comm BF 202201 202206 20
I am trying to get a new column in DF1 where I bring the values of the 'Goal' column in DF2 when the other columns are matchig. The main problem is that in DF2, in order to not repeat the same line multiple times, when a goal is applied, for example, to all markets instead of a line per market it states '*'.
Also there could be metrics in DF1 that are not appearing in DF2 as there are no goals.
The logic would be something like:
(df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') &
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] )
I've tried with np.where and np.select as:
df1['Numerator.Goal'] = (np.where((df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') &
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] ), df2['Goal'], np.nan))
--------
df1['Numerator.Goal'] = (np.select((df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') &
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] ), df2['Goal']))
but it ends up with a "ValueError: Can only compare identically-labeled Series objects"
I've think also on doing a .merge() but I havn't seen how to deal with the '*' symbol so I have not been successful at it.
By any chance, would you know how to accomplish this? Any help is appreciated!