2

I have a data frame of:

Index   Date        AA   BB   CC     DD    EE   FF
0       2019-01-15  0.0  -1.0  0.0   0.0   0.0  2.0
1       2019-01-17  0.0  -1.0  -1.0  -1.0  0.0  2.0
2       2019-01-22  1.0  -1.0  1.0   -1.0  0.0  2.0
3       2019-01-24  0.0  0.0   0.0   0.0   0.0  2.0
4       2019-01-29  1.0  0.0   -1.0  0.0   -1.0 2.0
5       2019-01-31  0.0  -1.0  0.0   0.0   0.0  2.0
6       2019-02-05  1.0  1.0   1.0   0.0   1.0  2.0
7       2019-02-12  2.0  1.0   1.0   0.0   2.0  2.0

which I'm plotting with:

dfs = dfs.melt('Date', var_name = 'cols', value_name = 'vals')
ax = sns.lineplot(x = "Date", y = 'vals', hue = 'cols', 
                  style = 'cols', markers = True, dashes = False, data = dfs)
ax.set_xticklabels(dfs['Date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation = -90)
plt.tight_layout()
plt.show()

resulting:


which is ugly. I want to have the markers in the exact place as what is in the data-frame but the lines to be smoothed. I'm aware of scipy -> spline (e.g. here), however that seems to be too much hassle to convert all the columns. There is also Pandas -> resample -> interpolate (e.g. here) which is very close to what I want but I have to turn the Date column to index which I don't want to do...

I would appreciate if you could help me know what is the best Pythonic way to do this.


P.S. A complete version of my code can be seen here.

Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
  • 1
    I see you are using `seaborn`. It helps if you add the tag in your question to get answers. – Erfan Apr 17 '19 at 22:59
  • @Erfan Well, it is not really a necessity. Any solution without seaborn is also OK for me. – Foad S. Farimani Apr 17 '19 at 23:02
  • How do you expect the smoothing to happen, specifically, if you don't want to add any points in-between? – JohanL Apr 17 '19 at 23:32
  • @JohanL I actually don't mind points to be added in between. I want the markers only to be shown for the actual data. So with matplotlib I usually plot twice once with the scatter and then plot the smoothed data. – Foad S. Farimani Apr 17 '19 at 23:40
  • And why not doing that now? – JohanL Apr 17 '19 at 23:44
  • [I'm actually trying](https://i.imgur.com/v2WK5xF.png) but it is not what I want. – Foad S. Farimani Apr 17 '19 at 23:47
  • I'm not sure what the problem was, you can always do `df_tmp = dfs.set_index()....` and do stuff on `df_tmp` without touching to data. – Quang Hoang Apr 18 '19 at 00:05
  • @Foad: That is what to be expected from raw interpolation, that you end up with extreme data. Try some other interpolation method, e.g. cubic spline, which may not give the same huge dip. – JohanL Apr 18 '19 at 00:15

1 Answers1

1

I think you need to write a custom plotting function that iterates over all columns and plots interpolated data to specified axes instance. Look at the following code:

import pandas as pd
import numpy as np

# data = pd.read_clipboard()
# data.drop(['Index'], axis=1, inplace=True)

def add_smooth_plots(df, ax,  timecolumn='Date', interpolation_method='cubic', colors='rgbky'):
    from itertools import cycle
    ind = pd.to_datetime(df.loc[:, timecolumn])
    tick_labels =ind.dt.strftime("%Y-%m-%d")
    color = cycle(colors)
    for i, col in enumerate(df.columns):
        if col != timecolumn:
            c = next(color)
            s = pd.Series(df.loc[:, col].values, index=ind)
            intp = s.resample('0.5D').interpolate(method=interpolation_method)
            true_ticks = intp.index.isin(ind)
            vals = intp.values
            intp = intp.reset_index()
            ticks = intp.index[true_ticks]
            ax.plot(np.arange(len(vals)), vals, label=col, color=c)
            ax.set_xticks(ticks)
            ax.set_xticklabels(tick_labels.values, rotation=45)
            ax.legend(title='Columns')
    return ax

from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)

add_smooth_plots(data, ax)

plt.show()

enter image description here

bubble
  • 1,634
  • 12
  • 17