0

I have the following Python code I am running in Databricks on a df of monthly employment by industry data.

The following code appends an empty df, calculates the YoY % change in employment by industry and shifts each series ahead by between 1 to 12 months and the newly estimated series added to the original df as a new column:

> # Forecast df parameters: index, column names
index = pd.date_range('2022-11-30', periods=13, freq='M')
columns = summary_table_empl.columns.to_list()                    

# Append history + empty df 
df_forecast = pd.DataFrame(index=index, columns=columns)
df_test_empl=pd.concat([summary_table_empl, df_forecast])

# New df, calculate yoy percent change for every commodity (col) 
df_CA_empl_test_yoy= ((df_test_empl - df_test_empl.shift(12))/df_test_empl.shift(12))*100

# Extend each variable as a series from 1 to 12 months
for col in df_CA_empl_test_yoy.columns:
   for i in range(1,13):
      df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)

df_CA_empl_test_yoy.head(15)

The code works, however I am getting this warning:

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy() df_CA_empl_test_yoy["%s_%s"%(col,i)] = df_CA_empl_test_yoy[col].shift(i)

How do I modify my code to get rid of this warning message?

jack homareau
  • 319
  • 1
  • 8

1 Answers1

1

You can suppress the warning with this piece of code:

import warnings  warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

The warning is raised because you are creating new columns with a for loop using df_CA_empl_test_yoy["%s_%s"%(col,i)], which internally calls the .insert() method every time the new column is assigned to your dataframe.

I don't see a way you can do this without a for loop. So turning the warning off is the way to go.

ali bakhtiari
  • 1,051
  • 4
  • 23