lm() saving residuals with group_by with R- confused spss user

Question

This is complete reEdit of my orignal question

Let's assume I'm working on RT data gathered in a repeated measure experiment. As part of my usual routine I always transform RT to natural logarytms and then compute a Z score for each RT within each partipant adjusting for trial number. This is typically done with a simple regression in SPSS syntax:

split file by subject.

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN 
  /DEPENDENT rtLN
  /METHOD=ENTER trial
  /SAVE ZRESID.

split file off.

To reproduce same procedure in R generate data:

#load libraries
library(dplyr); library(magrittr)

#generate data
    ob<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
    ob<-factor(ob)
    trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
    rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)
    cond<-c("first","first","first","snd","snd","snd","first","first","first","snd","snd","snd","first","first","first","snd","snd","snd")

    #Following variable is what I would get after using SPSS code
    ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)

    sym<-data.frame(ob, trial, rt, cond, ZreSPSS)

I could apply a formula (blend of Mark's and Daniel's solution) to compute residuals from a lm(log(rt)~trial) regression but for some reason group_by is not working here

sym %<>% 
  group_by (ob) %>% 
    mutate(z=residuals(lm(log(rt)~trial)),
    obM=mean(rt), obSd=sd(rt), zRev=z*obSd+obM)

Resulting values clearly show that grouping hasn't kicked in. Any idea why it didn't work out?

Can you post an example data set? Examples of the work you have done so far would go a long ways towards telling people what help you need. See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more on how to ask good R questions. — Mark Peterson, Oct 14 '16 at 12:06
Mark, ive rewritten my original question to follow your answers — blazej, Oct 15 '16 at 10:56

Daniel Winkler · Answer 1 · 2016-10-14T13:06:12.333

1

mylm <- lm(x~y)
rstandard(mylm)

This returns the standardized residuals of the function. To bind these to a variable you can do:

zresid <- rstandard(mylm)

EXAMPLE:

a<-rnorm(1:10,10)
b<-rnorm(1:10,10)
mylm <- lm(a~b)
mylm.zresid<-rstandard(mylm)

See also:

summary(mylm)

and

mylm$coefficients   
mylm$fitted.values  
mylm$xlevels
mylm$residuals      
mylm$assign         
mylm$call
mylm$effects        
mylm$qr             
mylm$terms
mylm$rank           
mylm$df.residual    
mylm$model

edited Oct 14 '16 at 13:06

answered Oct 14 '16 at 12:58

Daniel Winkler

487
3
11

Sorry for double comment - how about the 2nd part? How can I transform those residuals back to their original scale, considering that they where computed separatly for each participant – blazej Oct 14 '16 at 13:21
while your solution works in general, I can't get around the fact, that I need to compute residuals for each subject separately. As per your example: add: `c<-rnorm(1:2,10)` and then compute `mylm <- lm(a~b) mylm.zresid<-rstandard(mylm)` for each level of c – blazej Oct 14 '16 at 14:08

score 1 · Accepted Answer · answered Oct 14 '16 at 15:00

1

Using dplyr and magrittr, you should be able to calculate z-scores within individual with this code (it breaks things into the groups you tell it to, then calculates within that group).

experiment %<>%
  group_by(subject) %>%
  mutate(rtLN = log(rt)
         , ZRE1 = scale(rtLN))

You should then be able to do use that in your model. However, one thing that may help your shift to R thinking is that you can likely build your model directly, instead of having to make all of these columns ahead of time. For example, using lme4 to treat subject as a random variable:

withRandVar <-
  lmer(log(rt) ~ cond + (1|as.factor(subject))
       , data = experiment)

Then, the residuals should already be on the correct scale. Further, if you use the z-scores, you probably should be plotting on that scale. I am not actually sure what running with the z-scores as the response gains you -- it seems like you would lose information about the degree of difference between the groups.

That is, if the groups are tight, but the difference between them varies by subject, a z-score may always show them as a similar number of z-scores away. Imagine, for example, that you have two subjects, one scores (1,1,1) on condition A and (3,3,3) on condition B, and a second subject that scores (1,1,1) and (5,5,5) -- both will give z-scores of (-.9,-.9,-.9) vs (.9,.9,.9) -- losing the information that the difference between A and B is larger in subject 2.

If, however, you really want to convert back, you can probably use this to store the subject means and sds, then multiply the residuals by subjSD and add subjMean.

experiment %<>%
  group_by(subject) %>%
  mutate(rtLN = log(rt)
         , ZRE1 = scale(rtLN)
         , subjMean = mean(rtLN)
         , subjSD = sd(rtLN))

answered Oct 14 '16 at 15:00

Mark Peterson

9,370
2
25
48

Mark, thank you. I'll test your solution this evening. As a side note, the reason I really want to convert back to msec is this : I compute z scores only because I want to adjust for the fact that conservative trials tend to slow down response latency. I might run similar procedures adjusting for response extremity, which almost always speeds up latencies as response are more extreme. While analysis on those Z values yield stronger effects, mean msec presented on plots are more readable (I guess). – blazej Oct 14 '16 at 15:17
*consecutive not conservative, sorry. I'm typing from a mobile – blazej Oct 14 '16 at 15:18
I was trying to use your code but can't really make it to work for my example. I replaced ' mutate(rtLN = log(rt) , ZRE1 = scale(rtLN))' with something like 'mutate(rtLN = log(rt)) %>% mutate(ZRE1 = rstandard(lm(rtLN~trial)))' As what I really need in the Zscores is the residual from a regression but R won't accept this syntax. Any chance you could update your answer to match that scenario? – blazej Oct 15 '16 at 09:41
My last try was `group_by(sym, ob) %>% mutate(z=residuals(lm(log(rt)~trial))) residuals(lm(log(rt)~trial))` this works for computing z scores from a lm() function but fails to take split_by into consideration – blazej Oct 15 '16 at 09:58
Correction `group_by(sym, ob) %>% mutate(z=residuals(lm(log(rt)~trial)))` works for computing z scores from a lm() function but fails to take split_by into consideration (can I edit / delete dupe comments?) – blazej Oct 15 '16 at 09:59
`mutate` and the other `dplyr` verbs are designed to work on data.frames. They can be forced to work with more complicated data types(like models), but that is usually tricky. Instead, I strongly suggest that you generate the data you want, then pass it separately to the model function. – Mark Peterson Oct 15 '16 at 12:25
Thing is, that I what I'm trying to do involves both spliting (group_by) and saving a model outcome (lm()). Could you please suggest what kind of function look for that might work here? – blazej Oct 15 '16 at 13:33
Do you want a separate model for each group? Then you want `do`, but you're question implied you wanted one model, just with the scores standardized within group. This may require a different question. The general guidelines are to accept answers that solve *this* problem, then ask a different question. That way, others can see the original question and answers – Mark Peterson Oct 15 '16 at 13:36
I might have asked the wrong question or wasn't explicit about the given SPSS code. – blazej Oct 15 '16 at 13:42
Indeed I want to calculate `residuals(lm(log(rt)~rt))` for each participant `subject` variable separately – blazej Oct 15 '16 at 13:43

lm() saving residuals with group_by with R- confused spss user

2 Answers2