1

I have a data frame that I want to list by hour.

wkd = data.frame(hour = c(0,0,1,1,2,2), 
distance = c(5.69,0.56,6.90,1.81,9.88,1.56), 
time = c(23,3,17,7,32,7),
fare = c(18.35,5.39,18.46,12.90,28.08,5.81))

  hour distance time fare
1    0     5.69   23 18.35
2    0     0.56   3  5.39
3    1     6.90   17 18.46
4    1     1.81   7  12.90
5    2     9.88   32 28.08
6    2     1.56   7  5.81

After the list is created, I want to loop an lm function by fare ~ time + distance

I tried to use apply on the data frame with no success:

a = apply(wkd,2,as.list)

How to create a loop for a linear model in R
This looks relevant for what I want once I have the data frame in list format by hour.

After the list is created, I want to loop an lm() on fare ~ distance + time and at the end, I want to have the coefficients as a data frame with 24 linear equations

The final output I want should look like this:

  hour distance   time  intercept
1    0     2.25   0.36  2.35
2    1     3.25   0.41  3.45
3    2     4.56   0.22  5.22
Jacky
  • 710
  • 2
  • 8
  • 27

2 Answers2

2

If I understand your question correctly, you want to run a linear model for the data in each hour.

If that's it we can use split() to create the list and then sapply to run the model

wkd = split(wkd, f=wkd$hour)
res = sapply(wkd,function(x) lm(fare~ distance + time,data=x)$coefficients)

#Expected output
t(res)
Fino
  • 1,774
  • 11
  • 21
1

One tidyverse possibility could be:

wkd %>% 
 group_by(hour) %>%
 do(model = lm(fare ~ time + distance, data = .)$coefficients) %>%
 tidy(model) %>%
 mutate(names = ifelse(names == "(Intercept)", "intercept", names)) %>%
 spread(names, x) %>%
 select(hour, intercept, everything())

   hour intercept distance  time
  <dbl>     <dbl>    <dbl> <dbl>
1     0     3.45        NA 0.648
2     1     9.01        NA 0.556
3     2    -0.426       NA 0.891
tmfmnk
  • 38,881
  • 4
  • 47
  • 67