How to smooth a pandas / matplotlib lineplot?

Question

I have the following df, weekly spend in a number of shops:

             shop1    shop2    shop3    shop4    shop5    shop6    shop7  \
date_week                                                                  
2          4328.85  5058.17  3028.68  2513.28  4204.10  1898.26  2209.75   
3          5472.00  5085.59  3874.51  1951.60  2984.71  1416.40  1199.42   
4          4665.53  4264.05  2781.70  2958.25  4593.46  2365.88  2079.73   
5          5769.36  3460.79  3072.47  1866.19  3803.12  2166.84  1716.71   
6          6267.00  4033.58  4053.70  2215.04  3991.31  2382.02  1974.92   
7          5436.83  4402.83  3225.98  1761.87  4202.22  2430.71  3091.33   
8          4850.43  4900.68  3176.00  3280.95  3483.53  4115.09  2594.01   
9          6782.88  3800.03  3865.65  2221.43  4116.28  2638.28  2321.55   
10         6248.18  4096.60  5186.52  3224.96  3614.24  2541.00  2708.36   
11         4505.18  2889.33  2937.74  2418.34  5565.57  1570.55  1371.54   
12         3115.26  1216.82  1759.49  2559.81  1403.61  1550.77   478.34   
13         4561.82   827.16  4661.51  3197.90  1515.63  1688.57   247.25   

             shop8    shop9  
date_week                    
2          3578.81  3134.39  
3          4625.10  2676.20  
4          3417.16  3870.00  
5          3980.78  3439.60  
6          3899.42  4192.41  
7          4190.60  3989.00  
8          4786.40  3484.51  
9          6433.02  3474.66  
10         4414.19  3809.20  
11         3590.10  3414.50  
12         4297.57  2094.00  
13         3963.27   871.25

If I plot these in a line plot or "spaghetti plot" It works fine.

The goal is the look at trend in weekly sales over the last three months in 9 stores.

But looks a bit messy:

newgraph.plot()

I had a look at similar questions such as this one which uses df.interpolate() but it looks like I need to have missing values in there first. this answer seems to require a time series.

Is there another method to smoothen out the lines?

It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine. All I am interested in is the trend over the last number of weeks. I have also tried logy=True in the plot() method to calm the lines a bit, but it didn't help.

My df, for pd.DataFrame.fromt_dict():

{'shop1': {2: 4328.849999999999,
  3: 5472.0,
  4: 4665.530000000001,
  5: 5769.36,
  6: 6267.0,
  7: 5436.83,
  8: 4850.43,
  9: 6782.879999999999,
  10: 6248.18,
  11: 4505.18,
  12: 3115.26,
  13: 4561.82},
 'shop2': {2: 5058.169999999993,
  3: 5085.589999999996,
  4: 4264.049999999997,
  5: 3460.7899999999977,
  6: 4033.579999999998,
  7: 4402.829999999999,
  8: 4900.679999999997,
  9: 3800.0299999999997,
  10: 4096.5999999999985,
  11: 2889.3300000000004,
  12: 1216.8200000000002,
  13: 827.16},
 'shop3': {2: 3028.679999999997,
  3: 3874.5099999999984,
  4: 2781.6999999999994,
  5: 3072.4699999999984,
  6: 4053.6999999999966,
  7: 3225.9799999999987,
  8: 3175.9999999999973,
  9: 3865.6499999999974,
  10: 5186.519999999996,
  11: 2937.74,
  12: 1759.49,
  13: 4661.509999999998},
 'shop4': {2: 2513.2799999999997,
  3: 1951.6000000000001,
  4: 2958.25,
  5: 1866.1900000000003,
  6: 2215.04,
  7: 1761.8700000000001,
  8: 3280.9499999999994,
  9: 2221.43,
  10: 3224.9600000000005,
  11: 2418.3399999999997,
  12: 2559.8099999999995,
  13: 3197.9},
 'shop5': {2: 4204.0999999999985,
  3: 2984.71,
  4: 4593.459999999999,
  5: 3803.12,
  6: 3991.31,
  7: 4202.219999999999,
  8: 3483.529999999999,
  9: 4116.279999999999,
  10: 3614.24,
  11: 5565.569999999997,
  12: 1403.6100000000001,
  13: 1515.63},
 'shop6': {2: 1898.260000000001,
  3: 1416.4000000000005,
  4: 2365.8799999999997,
  5: 2166.84,
  6: 2382.019999999999,
  7: 2430.71,
  8: 4115.0899999999965,
  9: 2638.2800000000007,
  10: 2541.0,
  11: 1570.5500000000004,
  12: 1550.7700000000002,
  13: 1688.5700000000004},
 'shop7': {2: 2209.75,
  3: 1199.42,
  4: 2079.7300000000005,
  5: 1716.7100000000005,
  6: 1974.9200000000005,
  7: 3091.329999999999,
  8: 2594.0099999999993,
  9: 2321.5499999999997,
  10: 2708.3599999999983,
  11: 1371.5400000000004,
  12: 478.34,
  13: 247.25000000000003},
 'shop8': {2: 3578.8100000000004,
  3: 4625.1,
  4: 3417.1599999999994,
  5: 3980.7799999999997,
  6: 3899.4200000000005,
  7: 4190.600000000001,
  8: 4786.4,
  9: 6433.019999999998,
  10: 4414.1900000000005,
  11: 3590.1,
  12: 4297.57,
  13: 3963.27},
 'shop9': {2: 3134.3900000000003,
  3: 2676.2,
  4: 3870.0,
  5: 3439.6,
  6: 4192.41,
  7: 3989.0,
  8: 3484.51,
  9: 3474.66,
  10: 3809.2,
  11: 3414.5,
  12: 2094.0,
  13: 871.25}}

What are you trying to show with your plot? Defining you goals will make "It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine" a lot clearer — FlyingTeller, Apr 27 '20 at 12:48
Smoothing data can always introduce artifacts which could suggest results you don't have. Especially if the lines are as close together as in your plot. You could try a different plot type. E.g., stacked histogram? Think about what you want to see in this plot. Is it one shop thats sticking out? Then draw a thicker line for this one and decrease the alpha values for the other lines. I think this plot, as it is, would overwhelm the reader (?). — cmosig, Apr 27 '20 at 12:53
The goal is the look at trend in weekly sales over the last three months in 9 stores . @FlyingTeller the reason I said "It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine" is due to anticipating cmosig's comment saying this smoothing might suggest results I don't have. I just want to visualize the trend of all 9 shops at the same time, therefore exact values are not vital. — SCool, Apr 27 '20 at 13:26
@cmosig I'm actually interested in all of these shops and how they are trending towards the end of the graph (as we approach Covid lockdown). Which ones have increased in weekly sales and which ones are suffering? Exact dollar amount is not important, just the trend. Shop #3 is increasing, #7 is negatively affected etc etc. If you have other ideas on how to viualize this in an easier to digest format I'd appreciate it. — SCool, Apr 27 '20 at 13:34
@SCool how about displaying the difference between absolute shop sales in the respective week and average shop sales over the entire period on the y-axis? And then draw an axhline at 0. Maybe this will improve things. [example with bar plot] (http://www.qrb-bw.de/pdf_pool/pythondemo/cdpydoc/posnegbar.htm) — cmosig, Apr 27 '20 at 13:40
@cmosig Pretty good, I'll give that a go as a last resort. It would be great for looking at single shops. I'd just like to be able to compare them all at once because they are all in the same sector. For example, for the fast food sector I'd like to be able to compare multiple fast food places at the same time rather than creating and comparing 10 different charts. McDonalds, KFC, Burger King, Subway et.c all on the same chart. So I can see who's surviving Covid vs who's suffering at one glance. — SCool, Apr 27 '20 at 14:10

Arne · Accepted Answer · 2020-04-27T15:07:04.897

1

You could show the trend by plotting a regression line for the last few weeks, perhaps separately from the actual data, as the plot is already so crowded. I would use seaborn, because it has the convenient regplot() function:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

df.plot(figsize=[12, 10], style='--')
plt.xlim(2, 18)

last4 = df[len(df)-4:]

plt.gca().set_prop_cycle(None)
for shop in df.columns:
    sns.regplot(last4.index + 4, shop, data=last4, ci=None, scatter=False)
plt.ylabel(None)
plt.xticks(list(df.index)+[14, 17], labels=list(df.index)+[10, 13]);

edited Apr 27 '20 at 15:07

answered Apr 27 '20 at 14:56

Arne

9,990
2
18
28

That's much more pleasing to the eye, thank you. We are using the exact same data, but I'm getting the error: `AttributeError: 'Int64Index' object has no attribute 'mean'`. It seems to be caused by the `regplot` loop here: `for shop in df.columns: sns.regplot(last4.index + 4, shop, data=last4, ci=None, scatter=False)`., I think `last4.index` might be the problem. I tried changing it to a list but it didn't work very well. Any ideas? – SCool Apr 27 '20 at 15:53
@SCool Hmm, maybe a version issue? I'm using Python 3.7.7, pandas 1.0.3, matplotlib 3.1.3, and seaborn 0.10.0. – Arne Apr 27 '20 at 16:58
You could also try to pass an explicit list: `sns.regplot([14, 15, 16, 17], shop, data=last4, ci=None, scatter=False)` – Arne Apr 27 '20 at 17:01
1

I just updated seaborn and matplotlib. Works now. Thanks – SCool Apr 27 '20 at 22:10

How to smooth a pandas / matplotlib lineplot?

1 Answers1