0

What I have: accumulated number of deaths in a pandas dataframe.

What I need: a new column with the value of difference between two rows (number of deaths by day). So i did the following:

df_plot.head()

               date     deaths
1153383     2021-05-04  134
1153384     2021-05-03  120
1153385     2021-05-02  120
1153386     2021-04-30  119
1153387     2021-04-29  114

df_plot.set_index('date', inplace=True)
df_plot.sort_index(ascending=False)

df_plot_2 = df_plot['deaths'].shift() - df_plot['deaths']
df_plot['deaths_by_day'] = df_plot_2

But I receive this message. How can I do the new column the correct way?

ipython-input-46-df83920c83da>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_plot['deaths_by_day'] = df_plot_2

Thanks.

walves
  • 2,026
  • 6
  • 28
  • 46

1 Answers1

1

In your current example .copy() can be used to create a deep copy and avoid SettingWithCopyWarning.

df['death_diff'] = df['deaths'].shift() - df['deaths'].copy()

Want to learn more about SettingWithCopyWarning

.diff() method can also be used to get diff between two rows

df.set_index('date', inplace=True)
df = df.sort_index(ascending=False)
df['death_diff'] = df.deaths.diff().abs()
df

Output

            deaths  death_diff
date        
2021-05-04  134      NaN
2021-05-03  120      14.0
2021-05-02  120      0.0
2021-04-30  119      1.0
2021-04-29  114      5.0
Utsav
  • 5,572
  • 2
  • 29
  • 43