0

I want to retrieve values of beta for every subgroup formed by the group_by function. But when I run the following code, I get NA values in beta column for all the subgroups. Please help me fixing this issue.

My code is as follows:

fit_lm <- function(df) {
  lr <- lm(num_transactions ~ price , data = df)

  # Filter out groups where the model cannot be fitted
  if (length(lr$coefficients) < 2) {
    return(c(beta = NA, intercept = NA))
  }

  beta <- as.numeric(lr$coefficients[2])
  intercept <- as.numeric(lr$coefficients[1])

  return(c(beta = beta, intercept = intercept))
}

pred_sales_by_col <- machine_info %>%
  group_by(product_name, small_machine, column) %>%
  summarize(model_results = list(fit_lm(cur_data())), .groups = "drop") %>%
  mutate(beta = sapply(model_results, function(x) x[1]),
         intercept = sapply(model_results, function(x) x[2])) %>%
  select(-model_results)

# View the result
View(pred_sales_by_col)

I did get values for intercept but NA for beta in all subgroups.

  • 5
    Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(df)`. – neilfws Mar 13 '23 at 22:51

1 Answers1

0

So in your model lr <- lm(num_transactions ~ price , data = df) you have 2 variables -- num_transactions, price. Now your dataset should have atleast 2 rows without NA in "num_transactions" or "price" column - because in a 2 dimensional space you need atleast 2 data points to build the regression model.

In other words if sum(complete.cases(df)) < 2, then you'll get NA for beta in your case.

Consider the below example.

Case-1 fails because there are 2 variables (a,b), but there is just 1 row without any NA.

Case-2 succeeds because there are 2 variables (a,b), and there is 2 rows without any NA.

Case-3 fails because there are 3 variables (a,b,c), but there is just 2 rows without any NA.

Case-4 succeeds because there are 3 variables (a,b,c), and there is 3 row without any NA.

# Case-1
# FAILS
data1 = data.frame(a=1:100, b=runif(100))
data1[1:99,"b"]=NA
lm(a~b, data=data1)

# Case-2
# SUCCEEDS
data1 = data.frame(a=1:100, b=runif(100))
data1[1:98,"b"]=NA
lm(a~b, data=data1)

# Case-3
# FAILS
data1 = data.frame(a=1:100, b=runif(100), c=runif(100))
data1[1:98,"b"]=NA
lm(a~b+c, data=data1)

# Case-4
# SUCCEEDS
data1 = data.frame(a=1:100, b=runif(100), c=runif(100))
data1[1:97,"b"]=NA
lm(a~b+c, data=data1)

You will also get NA for beta if the values of price is same for all the rows.

Check the below code

data1 = data.frame(a=1:100, b=1)
lm(a~b, data=data1)
Ajay
  • 87
  • 4