I have the following Python code I am running in Databricks on a df of monthly employment by industry data.
The following code appends an empty df, calculates the YoY % change in employment by industry and shifts each series ahead by between 1 to 12 months and the newly estimated series added to the original df as a new column:
> # Forecast df parameters: index, column names
index = pd.date_range('2022-11-30', periods=13, freq='M')
columns = summary_table_empl.columns.to_list()
# Append history + empty df
df_forecast = pd.DataFrame(index=index, columns=columns)
df_test_empl=pd.concat([summary_table_empl, df_forecast])
# New df, calculate yoy percent change for every commodity (col)
df_CA_empl_test_yoy= ((df_test_empl - df_test_empl.shift(12))/df_test_empl.shift(12))*100
# Extend each variable as a series from 1 to 12 months
for col in df_CA_empl_test_yoy.columns:
for i in range(1,13):
df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)
df_CA_empl_test_yoy.head(15)
The code works, however I am getting this warning:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling
frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, usenewframe = frame.copy()
df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)
How do I modify my code to get rid of this warning message?