0

I have the following code:

import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime

start = datetime.date(2015,1,1)
end = datetime.date.today()
start1 = datetime.date(2019,1,1)

data = web.DataReader("^GSPC", 'yahoo',start, end)
data1 = web.DataReader("^GSPC", 'yahoo', start1, end)


data.set_index('month',append=True,inplace=True)
data1.set_index('month',append=True,inplace=True)
data1['pct_day']= data1['Adj Close'].pct_change()

df = data.groupby(['month', 'day']).mean()
df2['cumsum_pct_day']=df2['pct_day'].cumsum(axis = 0)

ax = df.plot(y='cumsum_pct_day', grid = True)
df2.plot(y='cumsum_pct_day', grid= True)

This code generates the following chart.

ax = df.plot(y='cumsum_pct_day', grid = True, legend =False)
df2.plot(y='cumsum_pct_day', grid= True, legend = False, ax=ax)

I have added ax=ax in the last line. The resulting plot looks like this enter image description here

Notice part of the numbering on x-axis is missing and also the orange line graph(which is the second plot i am trying to overlay) is not correct because we are on month 11 now and thus it should have plotted almost till the end of the plot. It seems to plot correctly when i plotting them separately though. What can I do to fix this?


df data:

High    Low Open    Close   Volume  Adj Close   year    pct_day cumsum_pct_day
month   day                                 
1   2   2429.246663 2398.623372 2406.529948 2421.346680 3.269703e+09    2421.346680 2017.333333 0.003077    0.003077
3   2490.463298 2462.286621 2480.446696 2472.926676 3.710683e+09    2472.926676 2018.000000 -0.003290   -0.000213
4   2394.595032 2361.170074 2373.360046 2384.834991 3.994610e+09    2384.834991 2017.500000 0.007196    0.006983
5   2272.832458 2252.469971 2266.932495 2262.359955 3.626045e+09    2262.359955 2016.500000 -0.002501   0.004482
6   2108.020020 2078.516683 2101.666626 2089.949992 4.045553e+09    2089.949992 2016.000000 -0.006164   -0.001682
... ... ... ... ... ... ... ... ... ... ...
12  27  2482.853353 2447.666585 2463.610026 2480.110026 2.761923e+09    2480.110026 2017.000000 0.003867    0.139297
28  2384.252502 2362.222473 2378.217529 2369.924988 2.685205e+09    2369.924988 2016.500000 -0.002486   0.136811
29  2342.730062 2326.236735 2333.063314 2333.743408 2.440620e+09    2333.743408 2016.000000 0.001718    0.138529
30  2165.460083 2147.795044 2164.475098 2151.095093 2.519165e+09    2151.095093 2015.500000 -0.005927   0.132602
31  2219.120036 2194.793335 2213.880046 2203.229980 2.901423e+09    2203.229980 2015.666667 -0.000460   0.132142
363 rows × 9 columns

df2 data:

High    Low Open    Close   Volume  Adj Close   year    pct_day cumsum_pct_day
month   day                                 
1   2   2519.489990 2467.469971 2476.959961 2510.030029 3733160000  2510.030029 2019    0.001268    0.001268
3   2493.139893 2443.959961 2491.919922 2447.889893 3822860000  2447.889893 2019    -0.024757   -0.023488
4   2538.070068 2474.330078 2474.330078 2531.939941 4213410000  2531.939941 2019    0.034336    0.010847
7   2566.159912 2524.560059 2535.610107 2549.689941 4104710000  2549.689941 2019    0.007010    0.017858
8   2579.820068 2547.560059 2568.110107 2574.409912 4083030000  2574.409912 2019    0.009695    0.027553
... ... ... ... ... ... ... ... ... ... ...
11  13  3098.060059 3078.800049 3084.179932 3094.040039 3509280000  3094.040039 2019    0.000712    0.218003
14  3098.199951 3083.260010 3090.750000 3096.629883 3276070000  3096.629883 2019    0.000837    0.218840
15  3120.459961 3104.600098 3107.919922 3120.459961 3335650000  3120.459961 2019    0.007695    0.226536
18  3121.479980 3112.060059 3117.909912 3120.199951 804057034   3120.199951 2019    -0.000083   0.226452
12  31  2509.239990 2482.820068 2498.939941 2506.850098 3442870000  2506.850098 2018    NaN NaN
224 rows × 9 columns
Slartibartfast
  • 1,058
  • 4
  • 26
  • 60
  • There is no way to test your code on our side as it stands incomplete. However, you may want to look into the this example [here](https://stackoverflow.com/questions/42973223/how-share-x-axis-of-two-subplots-after-they-are-created) and share your x-axis, i.e. from `ax=ax` in your last plot routine to `sharex=ax` – pyPN Nov 15 '19 at 20:27
  • It is the full code ! – Slartibartfast Nov 15 '19 at 20:31
  • Well actually, one of your graphs has value only till (10, 17) and no further data. So on the overlayed graph it is unreasonable to expect both of them to go till the end, because there isn't enough data. – learner Nov 18 '19 at 18:00
  • @learner The data from df2 goes all the way to 11th month though.https://i.stack.imgur.com/GgRz5.png – Slartibartfast Nov 18 '19 at 18:14

2 Answers2

1

The problem is that you have missing dates that are different in the two datasets. You can solve it by "filling" the missing dates at the beginning. This is done with:

# create index with all dates
full_dates = pd.date_range(start, end)
# fill missing dates with NaN values
data = data.reindex(full_dates)

Check the documentation of pd.reindex() to fill in with other values. Now that both datasets have all the same dates, you can just plot them together.

As a bonus, I added a few lines at the end of your code to have more meaningful ticks on the horizontal axis.

Full code:

import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
import numpy as np

start = datetime.date(2015,1,1)
end = datetime.date.today()
start1 = datetime.date(2019,1,1)

data = web.DataReader("^GSPC", 'yahoo',start, end)
data1 = web.DataReader("^GSPC", 'yahoo', start1, end)

data.index = pd.to_datetime(data.index, format ='%Y-%m-%d')
data1.index = pd.to_datetime(data1.index, format ='%Y-%m-%d')

# 3 new lines here to fill dates
full_dates = pd.date_range(start, end)
data = data.reindex(full_dates)
data1 = data1.reindex(full_dates)

data['year'] = data.index.year
data['month'] = data.index.month
data['week'] = data.index.week
data['day'] = data.index.day
data1['year'] = data1.index.year
data1['month'] = data1.index.month
data1['week'] = data1.index.week
data1['day'] = data1.index.day


data.set_index('month',append=True,inplace=True)
data1.set_index('month',append=True,inplace=True)
data.set_index('week',append=True,inplace=True)
data1.set_index('week',append=True,inplace=True)
data.set_index('day',append=True,inplace=True)
data1.set_index('day',append=True,inplace=True)

data['pct_day']= data['Adj Close'].pct_change()
data1['pct_day']= data1['Adj Close'].pct_change()

df = data.groupby(['month', 'day']).mean()
df2 = data1.groupby(['month', 'day']).mean()

df['cumsum_pct_day']=df['pct_day'].cumsum(axis = 0)
df2['cumsum_pct_day']=df2['pct_day'].cumsum(axis = 0)

ax = df.plot(y='cumsum_pct_day', grid = True, label='df')
df2.plot(y='cumsum_pct_day', grid= True, ax=ax, label='df2')

# bonus: more meaningful ticks

firsts = pd.date_range(start='1/1/2019', periods=12, freq='M') # dates of first day of month
n_days =  list(firsts.days_in_month) # number of days in months
n_days = [0] + n_days[:-1]
ticks = np.cumsum(n_days) # index of 1st day of month
ticks_dates = full_dates[ticks]
ticklabels = [date.strftime('%m-%d') for date in ticks_dates]

ax.set_xticks(ticks)
ax.set_xticklabels(ticklabels, rotation=45)

plt.show()

enter image description here

presenter
  • 384
  • 1
  • 9
0
import pandas as pd
import matplotlib.pyplot as plt

Use subplots to fix this issue: We create an axes object (ax). Notice that we pass ax to both plots.

fig, ax = plt.subplots()

df.plot(y='cumsum_pct_day', grid = True, legend =False, ax=ax)
df2.plot(y='cumsum_pct_day', grid= True, legend = False, ax=ax)
supreeth2812
  • 107
  • 10
  • Thanks, The chart did not change. I understand what you have done but the chart was the same as when i did `ax = df.plot(y='cumsum_pct_day', grid = True, legend =False)` – Slartibartfast Nov 19 '19 at 10:55