I have 3/4s of an answer to an issue but need some help on the final part. I have some data for the EBIT of companies. Where the EBIT is negative I would like to replace the value with mean of the previous year and the current year, for example, if the company recorded negative EBIT in 1993, I would like to get the mean of the negative year (1993) and the previous year (1992).
I have the following code (which I found on Stackoverflow How to replace NA with mean by subset in R (impute with plyr?)) but I would like to change the impute.mean function to reflect the changes I want. That is I do not really want to convert the negative numbers to NA's
years <- c(1990, 1991, 1992, 1993, 1994)
gvkey <- c(1000, 1100, 1200, 1300, 1400, 1500)
join <- as.data.frame(rep_len(years, length.out = length(gvkey) *
length(years)))
join$gvkey <- rep(gvkey, length(years))
join$ebit <- runif(nrow(join), min=-100, max=100)
join$ebit[join$ebit < 0] <- NA ## very inefficient way of recognizing negative values
colnames(join) <- c("year", "gvkey", "ebit")
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
join <- join %>%
group_by(gvkey) %>%
mutate(
ebit = impute.mean(ebit))
I also found this which which is ideal except for the NA issue R replacing missing values with the mean of surroundings values
x <- (na.locf(join) + rev(na.locf(rev(join))))/2