R-sq values, linear regression of several trends within one dataset

Question

I am running into a sticky spot trying to solve for variance accounted for by trend several times within a single data set.....

My data is structured like this

x <- read.table(text = "
STA YEAR    VALUE
a   1968    457
a   1970    565
a   1972    489
a   1974    500
a   1976    700
a   1978    650
a   1980    659
b   1968    457
b   1970    565
b   1972    350
b   1974    544
b   1976    678
b   1978    650
b   1980    690
c   1968    457
c   1970    565
c   1972    500
c   1974    600
c   1976    678
c   1978    670
c   1980    750 " , header = T)

and I am trying to return something like this

STA  R-sq
a    n1
b    n2
c    n3

where n# is the corresponding r-squared value of the locations data in the original set....

I have tried

fit <- lm(VALUE ~ YEAR + STA, data = x)

to give the model of yearly trend of VALUE for each individual station over the years data is available for VALUE, within the master data set....

Any help would be greatly appreciated.... I am really stumped on this one and I know it is just a familiarity with R problem.

Here's what you want to do, I think: http://stackoverflow.com/a/1214432/1036500 — Ben, Feb 05 '13 at 22:27

score 2 · Answer 1 · edited May 23 '17 at 11:59

2

To get r-squared for VALUE ~ YEAR for each group of STA, you can take this previous answer, modify it slightly and plug-in your values:

# assuming x is your data frame (make sure you don't have Hmisc loaded, it will interfere)
models_x <- dlply(x, "STA", function(df) 
     summary(lm(VALUE ~ YEAR, data = df)))

# extract the r.squared values
rsqds <- ldply(1:length(models_x), function(x) models_x[[x]]$r.squared)
# give names to rows and col
rownames(rsqds) <- unique(x$STA)
colnames(rsqds) <- "rsq"
# have a look
rsqds
        rsq
a 0.6286064
b 0.5450413
c 0.8806604

EDIT: following mnel's suggestion here are more efficient ways to get the r-squared values into a nice table (no need to add row and col names):

# starting with models_x from above
rsqds <- data.frame(rsq =sapply(models_x, '[[', 'r.squared'))

# starting with just the original data in x, this is great:
rsqds  <- ddply(x, "STA", summarize, rsq = summary(lm(VALUE ~ YEAR))$r.squared)

  STA       rsq
1   a 0.6286064
2   b 0.5450413
3   c 0.8806604

edited May 23 '17 at 11:59

Community

1
1

answered Feb 05 '13 at 22:56

Ben

41,615
18
132
227

1

Or more simply `rsqds <- data.frame(rsq =sapply(models_x, '[[', 'r.squared'))`, or a one-liner `models_x <- ddply(x, "STA", summarize, rsq = summary(lm(VALUE ~ YEAR))$r.squared)` – mnel Feb 05 '13 at 23:07
ah yes, that's much better, thanks for the tips! I'll update my answer to include those – Ben Feb 05 '13 at 23:12
Take care not to have the `Hmisc` library loaded while doing this, it will interfere. I've updated my answer to indicate this (can you Please re-check for the benefit of future readers) – Ben Feb 06 '13 at 02:35
related previous Q&A here: http://stackoverflow.com/questions/7523427/ddply-with-lm-function – Ben Feb 15 '13 at 03:53

score 1 · Answer 2 · answered Nov 27 '15 at 14:30

    #first load the data.table package 
        library(data.table)
    #transform your dataframe to a datatable (I'm using your example)
        x<- as.data.table(x)
    #calculate all the metrics needed (r^2, F-distribution and so on) 
        x[,list(r2=summary(lm(VALUE~YEAR))$r.squared ,
        f=summary(lm(VALUE~YEAR))$fstatistic[1] ),by=STA]
           STA        r2         f
        1:   a 0.6286064  8.462807
        2:   b 0.5450413  5.990009
        3:   c 0.8806604 36.897258

A similar question with more answers here: http://stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r/33754058#33754058 — FraNut, Nov 30 '15 at 09:05

score 0 · Answer 3 · answered Feb 05 '13 at 21:14

there's only one r-squared value, not three.. please edit your question

# store the output 
y <- summary( lm( VALUE ~ YEAR + STA , data = x ) )
# access the attributes of `y`
attributes( y )
y$r.squared
y$adj.r.squared
y$coefficients
y$coefficients[,1]

# or are you looking to run three separate
# lm() functions on 'a' 'b' and 'c' ..where this would be the first? 
y <- summary( lm( VALUE ~ YEAR , data = x[ x$STA %in% 'a' , ] ) )
# access the attributes of `y`
attributes( y )
y$r.squared
y$adj.r.squared
y$coefficients
y$coefficients[,1]

R-sq values, linear regression of several trends within one dataset

3 Answers3