The answer to this question clearly explains how to retrieve tidy regression results by group when running a regression through a dplyr pipe, but the solution is no longer reproducible.
How can one use dplyr and broom in combination to run a regression by group and retrieve tidy results using R 4.02, dplyr 1.0.0, and broom 0.7.0?
Specifically, the example answer from the question linked above,
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
returns the following error (and three warnings) when I run it on my system:
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning messages:
1: Data frame tidiers are deprecated and will be removed in an upcoming release of broom.
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
If I reformat df.h$hour
as a character rather than factor,
df.h <- df.h %>%
mutate(
hour = as.character(hour)
)
re-run the regression by group, and again attempt to retrieve the results using broom::tidy
,
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
I get this error:
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
I assume that the problem has to do with the fact that the group-level regression results are stored as lists in dfHour$fitHour
, but I am unsure how to correct the error and once again tidily and quickly compile the regression results, as used to work in the originally posted code/answer.