1

I am trying to write a polynomial function between two columns of my data frame. Inside of these two columns I have grouped rows named Group1 and Group2. I want to fit these groups R~V values using

fit_all <- summary(lm(R ~ poly(V,2,raw=TRUE), data = df, subset = state))

but I am getting warning message which says

In summary.lm(lm(R ~ poly(V, 2, raw = TRUE), data = df_rep, subset = state)) : essentially perfect fit: summary may be unreliable

I check this error which might be related NA values. Since I don't have NA values neither in my real data nor df data I am stuck at this point.

finally for each Group1 and Group2 I want to extract coefficients for each group fittings.

Please take a look my reproducible example

set.seed(1)
No <- rep(seq(1,4,1),each=21)
AC <- rep(rep(c(78,110),each=1),times=length(No)/2)
state <- rep(rep(c("Group 1","Group 2"),2),each=21)
V <- rep(seq(100,2100,100),times=4)
R = sort(replicate(4, sample(5000:6000,21)))

df <- data.frame(No,AC,V,R,state)

head(df)

   No  AC   V    R   state
 1  1  78 100 5004 Group 1
 2  1 110 200 5014 Group 1
 3  1  78 300 5030 Group 1
 4  1 110 400 5039 Group 1
 5  1  78 500 5057 Group 1
 6  1 110 600 5068 Group 1
Alexander
  • 4,527
  • 5
  • 51
  • 98

1 Answers1

2

Check this example that uses dplyr and broom packages.

library(dplyr)
library(broom)

set.seed(1)
No <- rep(seq(1,4,1),each=21)
AC <- rep(rep(c(78,110),each=1),times=length(No)/2)
state <- rep(rep(c("Group 1","Group 2"),2),each=21)
V <- rep(seq(100,2100,100),times=4)
R = sort(replicate(4, sample(5000:6000,21)))

df <- data.frame(No,AC,V,R,state)


df2 = df %>% 
  group_by(state) %>% # group by variable state
  do(data.frame(model = tidy(lm(R~poly(V,2,raw=TRUE), data=.)))) %>% # for each group run a linear fit and save the output as a date table
  ungroup # forget about your initial grouping

Now you have a dataset (dt2) that has as columns some info from the linear model's output for each category. Then you can handle dt2 as any other dataset. For example:

df2 %>% filter(state=="Group 1") # get info only for Group 1
df2 %>% filter(state=="Group 1") %>% select(model.term, model.estimate) # get only variables and coefficients for Group 1
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • thanks a lot. Your solution is perfect! – Alexander Aug 07 '15 at 11:31
  • 2
    You're welcome. If you'are interested in various behaviours of this process try to replace do(data.frame(model = tidy(lm(R~poly(V,2,raw=TRUE), data=.)))) with do(model = tidy(lm(R~poly(V,2,raw=TRUE), data=.))), or do(model = summary(lm(R~poly(V,2,raw=TRUE), data=.))) and check your results (df2) and how you can access various info from dt2... – AntoniosK Aug 07 '15 at 11:41
  • ok I see. thanks for suggestions. – Alexander Aug 07 '15 at 13:10