0

I'm new to R so I only have a basic understanding of the language, but I have a question about linear regressions. Say I have a table that looks like this:

Number V4 V92
12 4 .1
14 6 .5
16 8 .25
13 5 .05
12 7 .2
13 5 .4

I want to create a new table where I first group by number - there are many datapoints here with many per 'number' - then take what the linear regression would be for V4,V92. How do I do this in one new table? I know how to do it one number at a time, but whats a quick way to make a new table where the columns are number, quantity and linear regression.

Thanks!!

EamonS
  • 1
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 07 '21 at 17:04
  • check out this dplyr solution: https://stackoverflow.com/questions/26765426/linear-model-and-dplyr-a-better-solution – icj Sep 07 '21 at 17:07

2 Answers2

1

You can nest the data frame and then map over the Number groups like this:

library(tidyverse)
tribble(
  ~Number, ~V4, ~V92,
  12,4,0.1,
  12,5,0.1,
  14,6,0.5,
  14,7,0.3
) %>%
  nest(-Number) %>%
  mutate(
    model = data %>% map(~ lm(V4 ~ V92, data = .x))
  )

You can check out this tutorial: https://r4ds.had.co.nz/many-models.html#nested-data

danlooo
  • 10,067
  • 2
  • 8
  • 22
0

Here is a base R way. split the data.frame by Number and lapply lm to each sub-df. Then extract the coefficients.

sp1 <- split(df1, df1$Number)
lm_list1 <- sapply(sp1, function(X){
  fit <- lm(V92 ~ V4, X)
  coef(fit)
})
lm_list1
#                     12    13  14   16
#(Intercept) -0.03333333 0.225 0.5 0.25
#V4           0.03333333    NA  NA   NA

Here is the same code with a larger data set.
First simulate data.

set.seed(2021)    # make the results reproducible
n <- 1e3
Number <- sample(df1$Number, n, TRUE)
V4 <- sample(1:10, n, TRUE)
V92 <- runif(n, min(df1$V92), max(df1$V92))
df2 <- data.frame(Number, V4, V92)

Now fit the models like above.

sp2 <- split(df2, df2$Number)
lm_list2 <- sapply(sp2, function(X){
  fit <- lm(V92 ~ V4, X)
  coef(fit)
})

lm_list2
#                      12           13          14          16
#(Intercept)  0.279490994  0.293545546 0.282990678 0.249364441
#V4          -0.001819071 -0.002314752 0.001782101 0.004820001

Question's data

df1 <-
structure(list(Number = c(12L, 14L, 16L, 13L, 12L, 13L), 
V4 = c(4L, 6L, 8L, 5L, 7L, 5L), V92 = c(0.1, 0.5, 0.25, 
0.05, 0.2, 0.4)), class = "data.frame", row.names = c(NA, -6L))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66