Is there a way to resolve there being differing x and y lengths between pooled regression and a fixed effects within estimator in R?

Question

I'm currently working on some pooled regression and a fixed effects within estimator model for panel data analysis in R. I've estimated the pooled and FE model using the library(plm) function:

m_pooled = plm(homiciderate ~ humanrightsviol + freedom + lngdp + econgrowth + popdensity + male1564,
                       data = df_hom, index = c("country","year"), model="pooling")

m_fixedeffects <- plm(homiciderate ~ humanrightsviol + freedom + lngdp + econgrowth + popdensity + male1564, 
            data = df_hom, index = c("country","year"),
            model="within")

I'm now looking to plot these both together using the following code:

plot_fixedeffects <- plot(m_fixedeffects, type = "effects", index = 2, col = "red")
plot_pooled <- plot(m_pooled, type = "effects", index = 2, col = "blue")
combined_plot <- plot_fixedeffects + plot_pooled

combined_plot <- combined_plot + labs(title = "Fixed Effects Within Estimator vs Pooled Regression",
                                      x = "Time",
                                      y = "Effect Estimate") +
  scale_color_manual(values = c("red", "blue"),
                     labels = c("Fixed Effects Within Estimator", "Pooled Regression"),
                     name = "Estimation Method")


print(combined_plot)

However, when I do so, I receive the error message as follows:

Error in xy.coords(x, y) : 'x' and 'y' lengths differ
In addition: Warning message:
In meanx * beta :
  longer object length is not a multiple of shorter object length

For information, I was expecting an outcome similar to this plot, but have not been able to produce it:

Example of expected plot

Is there any reason for this?

Can you share what the expected output looks like? Since the models have multiple independent variables, are you expecting multiple plots per model? Another point, for plot(), they cannot be combined together with "+". Maybe this older discussion (https://stackoverflow.com/questions/31954045/r-plotting-panel-model-predictions-using-plm-pglm) may help. — TheN, Apr 16 '23 at 14:49
Hi there, thanks for your help! I've added onto the post now what I was expecting. Any assistance is greatly appreciated. — CharlieMac30, Apr 16 '23 at 18:42

L Tyrone · Answer 1 · 2023-04-17T02:35:31.413

Speaking from experience, a common approach used by those new to R is to try and make the plotting process work with existing data. However, it is much better to make your data work for the plot. That's because in most cases (yours included), it is easier to plot a single dataset with all of your data than to work with multiple datasets.

I don't have a copy of your data, so I've created some example data based on the image you provided. Although you have only two groups and the example has six, the principles are the same. Here is an example workflow:

library(ggplot2)
# Set seed to make sample data replicable. runif() creates random numbers
# but running set.seed() ensures the data will look the same each time
set.seed(1)
# Create sample dfs based on example image
m_pooled <- data.frame(method = "Pooled Regression",
                       group = rep(1:3, each = 10),
                       x = c(runif(10, 5, 22),
                             runif(10, 23, 28),
                             runif(10, 25, 35)),
                       y = c(runif(10, 46000, 62000),
                             runif(10, 35000, 45000),
                             runif(10, 18000, 22000)))

m_fixedeffects <- data.frame(method = "Fixed Effects Within Estimator",
                             group = rep(4:6, each = 10),
                             x = c(runif(10, 40, 50),
                                   runif(10, 55, 63),
                                   runif(10, 68, 77)),
                             y = c(runif(10, 15000, 20000),
                                   runif(10, 12000, 17000),
                                   runif(10, 13000, 23000)))

# Combine both datasets. NOTE: both datasets require identical columns
combined_plot <- rbind(m_pooled, m_fixedeffects)

# Plot your data
ggplot() +
  # Plot point data, and define colour and group by "method" column
  geom_point(data = combined_plot, aes(x = x, y = y, group = method,
                                       colour = factor(method))) +
  # Colour groups manually and add text to legend
  scale_color_manual(values = c("red", "blue"),
                     labels = c("Fixed Effects Within Estimator",
                                "Pooled Regression"),
                     name = "Estimation Method") +
  # Plot regression lines by indivdual groups
  geom_smooth(data = combined_plot, 
              aes(x = x, y = y, group = group),
              method = "glm",
              colour = "black",
              fill = NA) +
  # Plot regression line for all groups combined
  # NOTE: this is added to replicate the example image,
  # and is likely not logical given your mixed methods.
  # Delete if necessary
  geom_smooth(data = combined_plot, 
              aes(x = x, y = y),
              method = "glm",
              colour = "black",
              fill = NA,
              linetype = "dashed") +
  # Add title and xy axis labels
  labs(title = "Fixed Effects Within Estimator vs Pooled Regression",
       x = "Time",
       y = "Effect Estimate")

Result:

If you only have two groups, e.g. "Pooled Regression" and "Fixed Effects Within Estimator", based on the example "combined_plot" dataset you would need to change = group to = method in the above code. That way, your regression lines will be grouped by the values in the "method" column. I have arbitrarily used glm for the regression lines, but I'm unfamiliar with your methods so you may need to change this also.

Is there a way to resolve there being differing x and y lengths between pooled regression and a fixed effects within estimator in R?

1 Answers1