How to calculate auto-regression with missing values for different surnames in one generation?

Question

I do have a dataset consisting of surnames, years and values y. My aim is to analyze whether the value y is dependent on the corresponding value y of the previous generation. Unfortunately, I do not have a value y for each surname in each generation.

As an example dataset, you can take the following:

set.seed(700)
df_1 <- data.frame(year = c(1700, 1700, 1700, 1700, 1730, 1730, 1730, 1730, 1760, 1760, 1760, 1760, 1790, 1790, 1790, 1790, 1820, 1820, 1820, 1820), generation = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5), surname = c("Miller", "NA", "Smith", "Garcia", "Miller", "Jordan", "Smith", "Garcia", "Miller", "Jordan", "NA", "Garcia", "Miller", "Jordan", "Smith", "NA", "NA", "Jordan", "Smith", "Garcia"), y=runif(20))

I run the following regression:

fitted_models = df_1 %>% group_by(surname) %>% do(model = lm(y ~ lag(y, n=1, order_by = year), data = df_1))

Now, I have three related questions:

(1) How can I take into account non-group-specific effects (such as generation specific fixed-effects)?

(2) How should I treat the NA-values?

(3) Does that regression take into account all observations with the respective observation of the previous generation or only the comparison between the first and the second generation?

Your case suggests a grouped regression. Did you see this? https://stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r — user2474226, Nov 11 '19 at 10:45
@user2474226 thank you for the advice. Grouped regression is a great idea and the discussion in the post you linked was very helpful. I edited the question. Do you know how I can combine those group-specific effects with non-group-specific effects such as overall generational fixed effects? — R-User, Nov 11 '19 at 13:26

How to calculate auto-regression with missing values for different surnames in one generation?

0 Answers0