1

I want to split the following data into two parts: observedfrom 2018-09 to 2019-11 and predicted from 2019-12 to the end of date column, plot them with solid and dashed lines respectively with matplotlib, plotly or seaborn, etc.

       date             price    pct
0   2018-09      10.599  0.020
1   2018-10      10.808  0.020
2   2018-11      10.418 -0.036
3   2018-12      10.166 -0.024
4   2019-01       9.995 -0.017
5   2019-02      10.663  0.067
6   2019-03      10.559 -0.010
7   2019-04      10.055 -0.048
8   2019-05      10.691  0.063
9   2019-06      10.766  0.007
10  2019-07      10.667 -0.009
11  2019-08      10.504 -0.015
12  2019-09      10.284 -0.021
13  2019-10      10.047 -0.023
14  2019-11       9.717 -0.033
15  2019-12       9.908 -0.029
16  2020-01       9.570 -0.045
17  2020-02       9.754 -0.023
18  2020-03       9.779 -0.025
19  2020-04       9.777 -0.031
20  2020-05       9.932 -0.020

I have tried with code as follows, firstly I get an error, second I didn't plot pct yet. Someone could help ? Thank you.

df = df.set_index('date')
plt.plot('date', 'price', data=df.loc['2018-09':'2019-11'], marker='o', color='green', linewidth=2)
plt.plot('date', 'price', data=df.loc['2019-12':], marker='o', color='green', linewidth=2, linestyle = '--')

It generates ValueError: x and y must have same first dimension, but have shapes (1,) and (15,)

EDIT: this code have successfully draw the plot for price, but I need to draw pct on the same plot.

df['date'] = pd.to_datetime(df['date'])

# https://stackoverflow.com/questions/46230864/split-dataframe-on-the-basis-of-date
split_date ='2019-12-01'
plt.figure(figsize=(10, 5))
plt.plot('date', 'rent_price', data = df.loc[df['date'] <= split_date], marker='o', color='red', linewidth=2)
plt.plot('date', 'rent_price', data = df.loc[df['date'] >= split_date], marker='o', color='green', linewidth=2, linestyle = '--')
Heikura
  • 1,009
  • 3
  • 13
  • 27
ah bon
  • 9,293
  • 12
  • 65
  • 148

2 Answers2

3

I think what you're describing would be best illutrated using plotly like this:

enter image description here

Complete code:

# imports
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd

# your data
df = pd.DataFrame({'date': {0: '2018-09',
                          1: '2018-10',
                          2: '2018-11',
                          3: '2018-12',
                          4: '2019-01',
                          5: '2019-02',
                          6: '2019-03',
                          7: '2019-04',
                          8: '2019-05',
                          9: '2019-06',
                          10: '2019-07',
                          11: '2019-08',
                          12: '2019-09',
                          13: '2019-10',
                          14: '2019-11',
                          15: '2019-12',
                          16: '2020-01',
                          17: '2020-02',
                          18: '2020-03',
                          19: '2020-04',
                          20: '2020-05'},
                         'price': {0: 10.599,
                          1: 10.808,
                          2: 10.418,
                          3: 10.166,
                          4: 9.995,
                          5: 10.663,
                          6: 10.559000000000001,
                          7: 10.055,
                          8: 10.690999999999999,
                          9: 10.765999999999998,
                          10: 10.667,
                          11: 10.504000000000001,
                          12: 10.284,
                          13: 10.047,
                          14: 9.717,
                          15: 9.908,
                          16: 9.57,
                          17: 9.754,
                          18: 9.779,
                          19: 9.777000000000001,
                          20: 9.932},
                         'pct': {0: 0.02,
                          1: 0.02,
                          2: -0.036000000000000004,
                          3: -0.024,
                          4: -0.017,
                          5: 0.067,
                          6: -0.01,
                          7: -0.048,
                          8: 0.063,
                          9: 0.006999999999999999,
                          10: -0.009000000000000001,
                          11: -0.015,
                          12: -0.021,
                          13: -0.023,
                          14: -0.033,
                          15: -0.028999999999999998,
                          16: -0.045,
                          17: -0.023,
                          18: -0.025,
                          19: -0.031,
                          20: -0.02}})

# make timestamp to make plotting easier
df['timestamp']=pd.to_datetime(df['date'])
df=df.set_index('timestamp')

# split data
df_predict = df.loc['2019-11':]
df_actual = df[~df.isin(df_predict)].dropna()


# plotly setup
fig = make_subplots(rows=2,
                    cols=1,
                    subplot_titles=('Price',  'Pct')) 

# Price, actual
fig.add_trace(go.Scatter(x=df_actual.index, y=df_actual['price'],
                         name = "price, actual",
                         mode='lines',
                         line=dict(color='steelblue', width=2)
                        )
              ,row=1, col=1)

# Price, prediction
fig.add_trace(go.Scatter(x=df_predict.index, y=df_predict['price'],
                         name = "price, prediction",
                         mode='lines',
                         line=dict(color='firebrick', width=2, dash='dash')
                        ),
                         row=1, col=1)

# pct actual
fig.add_trace(go.Scatter(x=df_actual.index, y=df_actual['pct'],
                         mode='lines',
                         name = "pct, actual",
                         line=dict(color='steelblue', width=2)
                        )
              ,row=2, col=1)

# pct prediction
fig.add_trace(go.Scatter(x=df_predict.index, y=df_predict['pct'],
                         name="pct, prediction",
                         mode='lines',
                         line=dict(color='firebrick', width=2, dash='dash')
                        ),
                         row=2, col=1)

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • Sorry, it generates `ModuleNotFoundError: No module named 'plotly.subplots'`. – ah bon Dec 05 '19 at 16:24
  • By the way, can we display values on the lines with `plotly`? – ah bon Dec 05 '19 at 16:26
  • @ahbon Only cheking in on phone right now. I'll get back to you tomorrow. – vestland Dec 05 '19 at 16:28
  • Sure, thank you. I'll update my question by converting `pct` with format of percentages and need display the values of `pct` and `price` on the plots. I've tried with `matplotlib`, it's doesn't work very with display values of `pct`. – ah bon Dec 05 '19 at 16:32
  • @ahbon Plotly can display almost anything. But you'll have to install it first =) – vestland Dec 05 '19 at 16:38
  • @ahbon Instead of an edit of the question, I'd suggest writing another one focusing specifically on the formatting of percentages. – vestland Dec 05 '19 at 16:40
1

Could try using subplots to print the data separately if the dimensions are different. There's documentation and tutorials for subplot on the matplotlib website.

df = df.set_index('date')
plt.subplot(211)
plt.plot('date', 'rent_price', data=df.loc['2018-09':'2019-11'], marker='o', color='green', linewidth=2)
plt.xlabel('Observed')
plt.subplot(212)
plt.plot('date', 'rent_price', data=df.loc['2019-12':], marker='o', color='green', linewidth=2, linestyle = '--')
plt.xlabel('Predicted')
plt.show()
Matts
  • 1,301
  • 11
  • 30
  • I think we need to use `fig, ax1 = plt.subplots() ax2 = ax1.twinx()`. Reference: https://stackoverflow.com/questions/52126702/multiple-y-axis-with-matplotlib-with-twinx – ah bon Dec 05 '19 at 05:37
  • 1
    This answer is calling the plot method of the dataframe in pandas. `df3.plot()`, so if you're doing it that way would need to use df.plot() and use the ax=ax1 or ax2. I wasn't sure how you wanted it to display, as the dates are different, but also used on the x axis. – Matts Dec 05 '19 at 09:49