I have a pandas dataframe with the name df_merged_population_current_iteration
whose data you can download here as a csv file: https://easyupload.io/bdqso4
Now I want to create a new dataframe called pareto_df
that contains all pareto-optimal solutions with regard to the minimization of the 2 objectives "Costs" and "Peak Load" from the dataframe df_merged_population_current_iteration
. Further, it should make sure that no duplicate values are stored meaning that if a solution have identical values for the 2 objectives "Costs" and "Peak Load" it should only save one solution. Additionally, there is a check if the value for "Thermal Discomfort" is smaller than 2. If this is not the case, the solution will not be included in the new pareto_df
.
For this purpose, I came up with the following code:
import pandas as pd
df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")
# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)
for i, row in df_merged_population_current_iteration.iterrows():
is_dominated = False
is_duplicate = False
for j, other_row in df_merged_population_current_iteration.iterrows():
if i == j:
continue
# Check if the other solution dominates the current solution
if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] <= row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] < row['Costs'] and other_row['Peak Load'] <= row['Peak Load']):
# The other solution dominates the current solution
is_dominated = True
break
# Check if the other solution is a duplicate
if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
is_duplicate = True
break
if not is_dominated and not is_duplicate and row['Thermal Discomfort'] < 2:
# The current solution is Pareto-optimal, not a duplicate, and meets the discomfort threshold
row_df = pd.DataFrame([row])
pareto_df = pd.concat([pareto_df, row_df], ignore_index=True)
print(pareto_df)
In most cases, the code works fine. However, there are cases, in which no pareto-optimal solution is added to the new dataframe pareto_df
, altough there exist pareto-optimal solutions that fulfill the criteria. This can be seen with the data I posted above. You can see that the solutions with the "id of the run" 7 and 8 are pareto-optimal (and fullfill the thermal discomfort constraint). However, the current code does not add any of those 2 to the new dataframe. It should add one of them (but not 2 as this would be a duplicate). I have to admit that I already tried a lot and had a closer look at the code, but I could not manage to find the mistake in my code.
Here is the actual output with the uploaded data:
Empty DataFrame
Columns: [Unnamed: 0, id of the run, Costs, Peak Load, Thermal Discomfort, Combined Score]
Index: []
And here is the desired output (one pareto-optimal solution):
Do you see what the mistake might be and how I have to adjust the code such that it in fact finds all pareto-optimal solutions without adding duplicates?
Reminder: Does anyone have any idea why the code does not find all pareto-optimal solutions? I'll highly appreciate any comments.