1

I have 3/4s of an answer to an issue but need some help on the final part. I have some data for the EBIT of companies. Where the EBIT is negative I would like to replace the value with mean of the previous year and the current year, for example, if the company recorded negative EBIT in 1993, I would like to get the mean of the negative year (1993) and the previous year (1992).

I have the following code (which I found on Stackoverflow How to replace NA with mean by subset in R (impute with plyr?)) but I would like to change the impute.mean function to reflect the changes I want. That is I do not really want to convert the negative numbers to NA's

years <- c(1990, 1991, 1992, 1993, 1994)
gvkey <- c(1000, 1100, 1200, 1300, 1400, 1500)

join <- as.data.frame(rep_len(years, length.out = length(gvkey) * 
length(years)))
join$gvkey <- rep(gvkey, length(years))
join$ebit <- runif(nrow(join), min=-100, max=100)

join$ebit[join$ebit < 0] <- NA    ## very inefficient way of recognizing negative values 

colnames(join) <- c("year", "gvkey", "ebit")

impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))

join <- join %>%
group_by(gvkey) %>%
 mutate(
  ebit = impute.mean(ebit))

I also found this which which is ideal except for the NA issue R replacing missing values with the mean of surroundings values

x <- (na.locf(join) + rev(na.locf(rev(join))))/2
Matthew Oldham
  • 187
  • 1
  • 13
  • A hint about replacing missing values with the surrounding values: this is called "moving average" imputation. You can do this in 1 line of code with the imputeTS package: na.ma(yourData, k=1). – Steffen Moritz Nov 09 '18 at 18:52

1 Answers1

0

This seems to do the work. Now the issue if two years in a row are negative.....

y <- join%>%group_by(gvkey)%>%mutate(adj_ebit=purrr::accumulate(ebit,~ifelse(.y<0,(.y+.x)/2,.y)))
Matthew Oldham
  • 187
  • 1
  • 13