2
personID<-c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
genger<-c('male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female', 'female')
height<-c(181, 161, 198, 195, 177, 175, 197, 195, 198, 193, 161, 167, 132, 181, 165, 151, 163, 180, 169, 181, 177, 135, 143, 107, 161, 142)
weight<-c(165,  73, 90, 89, 80, 159,    179,    177,    180,    175,    73, 76, 60, 165,    150,    69, 148,    164,    154,    165,    161,    61, 130,    97, 146,    65)
data<-data.frame(personID, genger, height, weight)
data

I am a R beginner.

I like to execute regression by the gender(male, female).

The regression formula is weight= solpe*height + intercept.

I did googling but I didn't understand several articles.

My desired output is like below.

person_id   gender  height  weight  predict_value  error
1            male   181      165       xxx           xx
2            male   161      73        ...           ...  
3            male   198      90 
4            male   195      89 
5            male   177      80 
6            male   175      159    
7            male   197      179    
8            male   195      177    
9            male   198      180    
10           male   193      175    
11           male   161       73    
12          female  167       76    
13          female  132       60    
14          female  181      165    
15          female  165      150    
16          female  151       69    

How can I do regression analysis by gender and add prediction and error column?

Any help would be appriciated.

Bruce Jung
  • 361
  • 1
  • 4
  • 13

1 Answers1

2

Here's one way. You can split your data, perform the regressions and use predict() to find the confidence intervals, then you can unsplit to return to the original structure. For example with your test data and splitting on the "genger" (sic) column in the sample data

unsplit(lapply(split(data, data$genger), function(x) {
    m<-lm(weight~height, x)
    cbind(x, predict(m, interval ="confidence"))
}), data$genger)

This returns

   personID genger height weight       fit        lwr       upr
1         1   male    181    165 124.17126  94.106766 154.23576
2         2   male    161     73  87.11321  29.280886 144.94554
3         3   male    198     90 155.67061 115.126629 196.21458
4         4   male    195     89 150.11190 113.707198 186.51660
5         5   male    177     80 116.75965  83.508504 150.01080
# etc...
MrFlick
  • 195,160
  • 17
  • 277
  • 295