I am trying to add values taken from one dataframe column by iterating over unique values (contract numbers). For smaller numbers of iteration, the script works perfectly. However, iterating over 1000 unique values, it creates duplicate values in the resulting dataframe, which in turn slows the processing speed and taking an unnecessarily long time for a processing. How should I make it more efficient?
https://i.stack.imgur.com/AxcqY.jpg - original dataframe
https://i.stack.imgur.com/Qtu3x.jpg - unnecessary duplicate dataframes in new dataframe
https://imgur.com/3i5gMoJ- unnecessary duplicate dataframes in new dataframe
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.DataFrame([["AB1111",'2018-08-15 00:00:00','164','123','123'],
["AB1111",'2018-08-15 00:03:00','564','453','126'],
["AB1111",'2018-08-15 00:10:00','364','1231','1223'],
["AB1111",'2018-08-15 00:01:00','564','575','1523'],
["CD1111",'2018-08-16 00:12:00','514','341','1213'],
["CD1111",'2018-08-15 00:02:00','564','1234','123'],
["CD1111",'2018-08-16 00:05:00','564','341','124'],
["CD1111",'2018-08-16 00:03:00','64','341','123'],
["EF1111",'2018-08-15 00:00:00','534','341','121'],
["EF1111",'2018-08-17 00:01:00','564','341','163'],
["EF1111",'2018-08-15 00:09:00','524','341','129']],
columns = ['contract', 'datetime',
'real_cons','solar_gen','battery_charge'])
# converting datetime column datatype to "datetime"
df['datetime'] = pd.to_datetime(df['datetime'])
#aggregation dataframe (new dataframe)
df_agg1 = pd.DataFrame()
for contract in df['contract'].unique()[:1500]:
print(contract)
df_contract = df.copy()[df['contract']==contract] # selecting each full dataframe from the main DF
df_contract.set_index('datetime', inplace=True) # set "datetime" column as an index
df_contract.sort_index(inplace=True) # sort index
df_contract = df_contract.loc['2018-8-15'] # select timeframe
# creating GB61074_cons column, which will be added to df_agg, from df_contract 'real_cons' column
df_contract[f'{contract}_con'] = df_contract['real_cons']
if df_agg1.empty:
df_agg1 = df_contract[[f'{contract}_con']] # first column
else:
df_agg1 = df_agg1.join(df_contract[f'{contract}_con']) # subsequent columns
df_agg1
How to create the new dataframe without creating these unnecessary duplicates? What is causing them to be created?