So I have some data like:
df = pandas.DataFrame({"X1":[1,2,5]*4, "X2":[1,2,10]*4,
"Y":[2,4,6]*4, "group":["A","B"]*6})
And I want to create a table of linear regression slope coefficients, for each group, and for each relevant combination of variables, something along the lines of:
group x y coef
A X1 Y 0.97
A X2 Y 0.85
B X1 Y 0.73
B X2 Y 0.81
I'm trying to do it something like this:
def OLS_slope_coef(df, xcol=0, ycol=1):
x = df.ix[:,xcol]
y = df.ix[:,ycol]
slope, intercept, r, p, stderr = scipy.stats.linregress(x, y)
return(slope)
s_df = pandas.DataFrame()
for x in ['X1', 'X2']:
for y in ['Y']:
s_df.ix[(x, y), 'coef'] = df.groupby('group').apply(OLS_slope_coef, x, y)
But it gives a ValueError: Incompatible indexer with Series
.
Is there some way to do something like this? I don't care if the group
, x
, and y
variables are indexes or dataframe columns (I'm going to .reset_index()
anyway).