24

Does anyone have a nice clean way to get predict behavior for felm models?

library(lfe)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
predict(model1, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Does not work
kennyB
  • 1,963
  • 3
  • 17
  • 22
  • predict doesnt work because it creates felm class object and predict wont work for it – Ajay Ohri Jul 02 '15 at 17:08
  • 1
    Just a note, you don't have to say `data(iris)`, iris data is lazyloaded already. – Gregor Thomas Jul 02 '15 at 18:28
  • 1
    as for adding predict to include to felm create a request to r-proj-c > methods("predict") [1] predict.ar* predict.Arima* predict.arima0* [4] predict.glm predict.HoltWinters* predict.lm [7] predict.loess* predict.mlm* predict.nls* [10] predict.poly* predict.ppr* predict.prcomp* [13] predict.princomp* predict.smooth.spline* predict.smooth.spline.fit* [16] predict.StructTS* – Ajay Ohri Jul 02 '15 at 19:49
  • I think quite a bit of re-engineering the `felm()` function (and the functions it calls) would be necessary as the current implementation does not store the fixed effect coefficients, or even apparently the intercept -- see [this answer](https://stackoverflow.com/a/45288380/8386140) on a question that is at least a near duplicate of this one. – duckmayr Feb 03 '18 at 12:07

6 Answers6

15

UPDATE (2020-04-02): The answer from Grant below using the new package fixest provides a more parsimonious solution.

As a workaround, you could combine felm, getfe, and demeanlist as follows:

library(lfe)

lm.model <- lm(data=demeanlist(iris[, 1:2], list(iris$Species)), Sepal.Length ~ Sepal.Width)
fe <- getfe(felm(data = iris, Sepal.Length ~ Sepal.Width | Species))
predict(lm.model, newdata = data.frame(Sepal.Width = 3)) + fe$effect[fe$idx=="virginica"]

The idea is that you use demeanlist to center the variables, then lm to estimate the coefficient on Sepal.Width using the centered variables, giving you an lm object over which you can run predict. Then run felm+getfe to get the conditional mean for the fixed effect, and add that to the output of predict.

pbaylis
  • 1,529
  • 11
  • 19
  • How do you do this for multiple fe? – wolfsatthedoor Feb 03 '18 at 04:00
  • You add the other FE to the demeanlist and getfe commands, then add another term onto the final sum. – pbaylis Feb 05 '18 at 05:37
  • This answer should get more attention, getfe is a very useful command and it's obvious how to predict once you have that. Furthermore it seems to be the only answer that actually answers the question in a general, correct way – wolfsatthedoor Feb 05 '18 at 18:39
  • 1
    Well, it's not as general as I would like. You couldn't use my code to construct standard errors on yhat, or the confidence or prediction interval. I don't know how to do that, so I posted a similar question to this one to see if anyone else had thoughts. https://stackoverflow.com/questions/48634449/predict-using-felm-output-with-standard-errors – pbaylis Feb 06 '18 at 02:16
  • In defense of this answer, as I read his question, he doesn't ask for inference, just point estimates. But I agree with you that the distribution would be nice. Hopefully someone answers the other question.. – wolfsatthedoor Feb 06 '18 at 15:30
  • In the `predict` line, wouldn't you want to use a demeaned value for `Sepal.Width` since the original input for the model is demeaned, and you're adding the FE at the end? – ndem763 Oct 13 '19 at 22:43
  • 1
    No, we want to use the original value, since the coefficients we estimate still represent the same thing they would have in the uncentered model. You can double check by running predict on the `lm` equivalent: `lm2 <- lm(data = iris, Sepal.Length ~ Sepal.Width + factor(Species)) predict(lm2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))` – pbaylis Oct 15 '19 at 05:02
11

Late to the party, but the new fixest package (link) has a predict method. It supports high-dimensional fixed effects (and clustering, etc.) using a very similar syntax to lfe. Somewhat remarkably, it is also considerably faster than lfe for the benchmark cases that I've tested.

library(fixest)

model_feols <- feols(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model_feols, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
Grant
  • 1,536
  • 13
  • 25
6

This might not be the answer that you are looking for, but it seems that the author did not add any functionality to the lfe package in order to make predictions on external data by using the fitted felm model. The primary focus seems to be on the analysis of the group fixed effects. However, it's interesting to note that in the documentation of the package the following is mentioned:

The object has some resemblance to an 'lm' object, and some postprocessing methods designed for lm may happen to work. It may however be necessary to coerce the object to succeed with this.

Hence, it might be possible to coerce the felm object to an lm object in order to obtain some additional lm functionality (if all the required info is present in the object to perform the necessary computations).

The lfe package is intended to be run on very large datasets and effort was made to conserve memory: As a direct result of this, the felm object does not use/contain a qr decomposition, as opposed to the lm object. Unfortunately, the lm predict procedure relies on this information in order to compute the predictions. Hence, coercing the felm object and executing the predict method will fail:

> model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
> class(model2) <- c("lm","felm") # coerce to lm object
> predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
 Rank zero or should not have used lm(.., qr=FALSE).

If you really must use this package to perform the predictions then you could maybe write your own simplified version of this functionality by using the information that you have available in the felm object. For example, the OLS regression coëfficients are available via model2$coefficients.

Jellen Vermeir
  • 1,720
  • 10
  • 10
4

This should work for cases where you wish to ignore the group effects in the prediction, are predicting for new X's, and only want confidence intervals. It first looks for a clustervcv attribute, then robustvcv, then vcv.

predict.felm <- function(object, newdata, se.fit = FALSE,
                         interval = "none",
                         level = 0.95){
  if(missing(newdata)){
    stop("predict.felm requires newdata and predicts for all group effects = 0.")
  }

  tt <- terms(object)
  Terms <- delete.response(tt)
  attr(Terms, "intercept") <- 0

  m.mat <- model.matrix(Terms, data = newdata)
  m.coef <- as.numeric(object$coef)
  fit <- as.vector(m.mat %*% object$coef)
  fit <- data.frame(fit = fit)

  if(se.fit | interval != "none"){
    if(!is.null(object$clustervcv)){
      vcov_mat <- object$clustervcv
    } else if (!is.null(object$robustvcv)) {
      vcov_mat <- object$robustvcv
    } else if (!is.null(object$vcv)){
      vcov_mat <- object$vcv
    } else {
      stop("No vcv attached to felm object.")
    }
    se.fit_mat <- sqrt(diag(m.mat %*% vcov_mat %*% t(m.mat)))
  }
  if(interval == "confidence"){
    t_val <- qt((1 - level) / 2 + level, df = object$df.residual)
    fit$lwr <- fit$fit - t_val * se.fit_mat
    fit$upr <- fit$fit + t_val * se.fit_mat
  } else if (interval == "prediction"){
    stop("interval = \"prediction\" not yet implemented")
  }
  if(se.fit){
    return(list(fit=fit, se.fit=se.fit_mat))
  } else {
    return(fit)
  }
}
kennyB
  • 1,963
  • 3
  • 17
  • 22
3

To extend the answer from pbaylis, I created a slightly longwinded function that extends nicely to allow for more than one fixed effect. Note that you have to manually enter the original dataset used in the felm model. The function returns a list with two items: the vector of predictions, and a dataframe based on the new_data that includes the predictions and fixed effects as columns.

predict_felm <- function(model, data, new_data) {

  require(dplyr)

  # Get the names of all the variables
  y <- model$lhs
  x <- rownames(model$beta)
  fe <- names(model$fe)

  # Demean according to fixed effects
  data_demeaned <- demeanlist(data[c(y, x)],
                             as.list(data[fe]),
                             na.rm = T)

  # Create formula for LM and run prediction
  lm_formula <- as.formula(
    paste(y, "~", paste(x, collapse = "+"))
  )

  lm_model <- lm(lm_formula, data = data_demeaned)
  lm_predict <- predict(lm_model,
                        newdata = new_data)

  # Collect coefficients for fe
  fe_coeffs <- getfe(model) %>% 
    select(fixed_effect = effect, fe_type = fe, idx)

  # For each fixed effect, merge estimated fixed effect back into new_data
  new_data_merge <- new_data
  for (i in fe) {

    fe_i <- fe_coeffs %>% filter(fe_type == i)

    by_cols <- c("idx")
    names(by_cols) <- i

    new_data_merge <- left_join(new_data_merge, fe_i, by = by_cols) %>%
      select(-matches("^idx"))

  }

  if (length(lm_predict) != nrow(new_data_merge)) stop("unmatching number of rows")

  # Sum all the fixed effects
  all_fixed_effects <- base::rowSums(select(new_data_merge, matches("^fixed_effect")))

  # Create dataframe with predictions
  new_data_predict <- new_data_merge %>% 
    mutate(lm_predict = lm_predict, 
           felm_predict = all_fixed_effects + lm_predict)

  return(list(predict = new_data_predict$felm_predict,
              data = new_data_predict))

}

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict_felm(model = model2, data = iris, new_data = data.frame(Sepal.Width = 3, Species = "virginica"))
# Returns prediction and data frame
dk_webb
  • 31
  • 2
-2

I think what you're looking for might be the lme4 package. I was able to get a predict to work using this:

library(lme4)
data(iris)

model2 <- lmer(data = iris, Sepal.Length ~ (Sepal.Width | Species))
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
       1 
6.610102 

You may have to play around a little to specify the particular effects you're looking for, but the package is well-documented so it shouldn't be a problem.

Tchotchke
  • 3,061
  • 3
  • 22
  • 37
  • This doesn't seem to replicate the example above and has results2 where it should have model2. – kennyB Jul 02 '15 at 20:28
  • Fixed the results2 (typo). The difference I'm seeing between the two answers is .001, which could easily just come from slight differences between how the two models are implemented. – Tchotchke Jul 02 '15 at 23:59
  • Still doesn't seem to be working on my machine. I get this error ``Error: sum(nb) == q is not TRUE`` – kennyB Jul 03 '15 at 02:45
  • I updated with the complete code (loading in the library and data), and it works on both my Mac and PC. I'm using R 3.1.1 on my Mac. I'm not sure why it's not working for you - my original thought would be that it's due to NA, but we're only predicting on one observation so that shouldn't be a problem. – Tchotchke Jul 03 '15 at 14:32
  • Yeah it's bizarre. I'm using 3.2.0 and also tried it on 3.1.3. Regardless, your answer doesn't answer the question, which specifically asked for the predict method for felm models. – kennyB Jul 03 '15 at 19:51
  • 4
    lmer implements RANDOM effects. lfe implements fixed effects. fixed effects are not shrunken, because the goal is typically inference about marginal effects, rather than prediction. If you want to fit a fixed effects model, don't use `lmer`. – generic_user Dec 30 '15 at 20:48