How to get rid of performance warning of fragmented df in python caused by too many joins?

Question

I have the following Python code I am running in Databricks on a df of monthly employment by industry data.

The following code appends an empty df, calculates the YoY % change in employment by industry and shifts each series ahead by between 1 to 12 months and the newly estimated series added to the original df as a new column:

> # Forecast df parameters: index, column names
index = pd.date_range('2022-11-30', periods=13, freq='M')
columns = summary_table_empl.columns.to_list()                    

# Append history + empty df 
df_forecast = pd.DataFrame(index=index, columns=columns)
df_test_empl=pd.concat([summary_table_empl, df_forecast])

# New df, calculate yoy percent change for every commodity (col) 
df_CA_empl_test_yoy= ((df_test_empl - df_test_empl.shift(12))/df_test_empl.shift(12))*100

# Extend each variable as a series from 1 to 12 months
for col in df_CA_empl_test_yoy.columns:
   for i in range(1,13):
      df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)

df_CA_empl_test_yoy.head(15)

The code works, however I am getting this warning:

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy() df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)

How do I modify my code to get rid of this warning message?

`import warnings warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)` — ali bakhtiari, Jan 06 '23 at 19:07
Does this answer your question? [Python Pandas – How to supress PerformanceWarning?](https://stackoverflow.com/questions/51521526/python-pandas-how-to-supress-performancewarning) — ali bakhtiari, Jan 06 '23 at 19:18
Can the for statement be modified? If not, then your response answers my question. — jack homareau, Jan 06 '23 at 20:02

score 1 · Accepted Answer · answered Jan 06 '23 at 20:33

You can suppress the warning with this piece of code:

import warnings  warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

The warning is raised because you are creating new columns with a for loop using df_CA_empl_test_yoy["%s_%s"%(col,i)], which internally calls the .insert() method every time the new column is assigned to your dataframe.

I don't see a way you can do this without a for loop. So turning the warning off is the way to go.

How to get rid of performance warning of fragmented df in python caused by too many joins?

1 Answers1