0

From what I have found searching the web, below is the approach that I would use to perform a polynomial regression of degree 2 on data (this is culled from the web...I don't have access at the moment to the actual commands I performed on my data, but I mimicked this):

Call:
lm(sample1$Population ~ poly(sample1$Year, 2, raw=TRUE))

Residuals:
    Min      1Q  Median      3Q     Max 
-46.888 -18.834  -3.159   2.040  86.748 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5263.159     17.655 298.110  < 2e-16 ***
sample1$Year        29.318      3.696   7.933 4.64e-05 ***
I(sample1$Year^2)  -10.589      1.323  -8.002 4.36e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 38.76 on 8 degrees of freedom
Multiple R-squared: 0.9407,     Adjusted R-squared: 0.9259 
F-statistic: 63.48 on 2 and 8 DF,  p-value: 1.235e-05 

My dataset is a collection of groups of data, each group having 70+ rows corresponding to monthly data measurements of several variables. I need to calculate the regression on each group of data, and find the groups with statistically significant values for the second derivative. I'd like to end up with a data set which contains one row per group_id and one column for each of the data points that make up the summary displayed above.

Scott Wood
  • 1,077
  • 3
  • 18
  • 34
  • 1
    Look at `plyr` or `data.table` or a combination of `split` and `lapply`. A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) will make an answer more forthcoming! – mnel Aug 16 '12 at 00:02
  • The part about "statistically significant ... second derivatives" looks rather suspicious. Why would we think that the significance of second derivatives was being assessed? – IRTFM Aug 16 '12 at 01:36
  • That's a good point, I could easily be making a logical mistake. I want to identify groups for which the identified function is a reasonably good fit. The idea is that I want to identify a "turning point" in the group, and ignore groups that can't reasonably be described as having a turning point, which I am interpreting for now as not being able to be get statistical significance when fitting a second degree polynomial. – Scott Wood Aug 16 '12 at 02:04
  • And what specifically is a "data point ... in the summary displayed above"? – IRTFM Aug 16 '12 at 06:14
  • I'm not sure I understand you correctly, but it might be better to fit linear models with and without the quadratic term and ,e.g., compare their AIC values. – Roland Aug 16 '12 at 06:18
  • The "data points(s)...in the summary displayed above" would be the Interecept estimate, std error, t value, a Pr(>|t|), the sample1$year estimate, std error...., the residual standard error and its df, the Multiple R-squared and Adjusted R-squared, the F-statistics and it's df's, and the p-value. – Scott Wood Aug 17 '12 at 13:53
  • The data looks like this: group_id obs_month data1 data2 1 1 10 13 1 2 14 7 ... 2 1 345 76 2 2 309 234 ... – Scott Wood Aug 21 '12 at 13:28

0 Answers0