0

I have plotted a graph with two y axes and would now like to add two separate trendlines for each of the y plots.

This is my code:

import matplotlib.pyplot as plt 
import pandas as pd
import numpy as np
%matplotlib inline

amp_costs=pd.read_csv('/Users/Ampicillin_Costs.csv', index_col=None, usecols=[0,1,2])
amp_costs.columns=['PERIOD', 'ITEMS', 'COST PER ITEM']

ax=amp_costs.plot(x='PERIOD', y='COST PER ITEM', color='Blue', style='.', markersize=10)
amp_costs.plot(x='PERIOD', y='ITEMS', secondary_y=True,
color='Red', style='.', markersize=10, ax=ax)

Any guidance as to how to plot these two trend lines to this graph would be much appreciated!

Louise Stevens
  • 85
  • 1
  • 3
  • 9
  • It looks like you did just that in your code. What is the problem? What kind of output did you get? Please make a minimal reproducible example for us. It makes it so much easier to answer questions that way. http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Ted Petrou Dec 20 '16 at 16:03
  • What kind of trendline? Linear? Rolling Average? What isn't working in your code? what have you tried? – Demetri Pananos Dec 20 '16 at 16:06
  • @DemetriP Linear...This code is working fine however I am unsure where to start in order to add 2 trendlines - 1 for each set of plotted data. – Louise Stevens Dec 20 '16 at 16:08
  • You'll need to add another column in your dataframe to represent the trend line. Look at statsmodels.ols or sklearn.linear_model.LinearRegression to do this. – Demetri Pananos Dec 20 '16 at 16:10

1 Answers1

1

Here is a quick example of how to use sklearn.linear_model.LinearRegression to make the trend line.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
plt.style.use('ggplot')
%matplotlib inline

period = np.arange(10)
items = -2*period +1 + np.random.randint(-2,2,len(period))
cost = 35000*period +15000 + np.random.randint(-25000,25000,len(period))
data = np.vstack((period,items,cost)).T
df = pd.DataFrame(data, columns=\['P','ITEMS', 'COST'\]).set_index('P')


lmcost = LinearRegression().fit(period.reshape(-1,1), cost.reshape(-1,1))
lmitems = LinearRegression().fit(period.reshape(-1,1), items.reshape(-1,1))

df['ITEMS_LM'] = lmitems.predict(period.reshape(-1,1))
df['COST_LM'] = lmcost.predict(period.reshape(-1,1))

fig,ax = plt.subplots()


df.ITEMS.plot(ax = ax, color = 'b')
df.ITEMS_LM.plot(ax = ax,color= 'b', linestyle= 'dashed')
df.COST.plot(ax = ax, secondary_y=True, color ='g')
df.COST_LM.plot(ax = ax, secondary_y=True, color = 'g', linestyle='dashed')

enter image description here

Demetri Pananos
  • 6,770
  • 9
  • 42
  • 73
  • 'amp_costs=pd.read_csv('/Users/Ampicillin_Costs.csv', index_col=None, usecols=[1,2]) amp_costs.columns=['ITEMS', 'COST PER ITEM'] z = np.polyfit(x=amp_costs.loc[:, amp_costs['COST PER ITEM']], y=amp_costs.loc[:, amp_costs['ITEMS']], deg=1) p = np.poly1d(z) amp_costs['trendline'] = p(amp_costs.loc[:, amp_costs['COST PER ITEM']]) x=amp_costs['COST PER ITEM'] y=amp_costs['ITEMS'] ax = amp_costs.plot.scatter(x, y) amp_costs.set_index(x, inplace=True) amp_costs.trendline.sort_index(ascending=False).plot(ax=ax) plt.gca().invert_xaxis() plt.show()' – Louise Stevens Dec 20 '16 at 17:04