I need some help filling cells which have an 'NA' values with other values which are already present in the surrounding rows.
I currently have a panel dataset of investors and their activities. Some of the rows were missing, so I have completed the panel to include these rows, replacing the financial deal information with '0' values.
The other variables relate to wider firm characteristics, such as region and strategy. I am unsure how to replicate these for each firm.
This is my code so far.
df <- df %>%
group_by(investor) %>%
mutate(min = min(dealyear, na.rm = TRUE),
max = max(dealyear, na.rm = TRUE)) %>%
complete(investor, dealyear = min:max, fill = list(counttotal=0, countgreen=0, countbrown=0)) %>%
An example of data before completion - notice year 2004 is missing.
investor | dealyear | dealcounts | strategy | region |
---|---|---|---|---|
123IM | 2002 | 5 | buyout | europe |
123IM | 2003 | 5 | buyout | europe |
123IM | 2005 | 5 | buyout | europe |
123IM | 2006 | 5 | buyout | europe |
Example of data after completion, with missing row added in
investor | dealyear | dealcounts | strategy | region |
---|---|---|---|---|
123IM | 2002 | 5 | buyout | europe |
123IM | 2003 | 5 | buyout | europe |
123IM | 2004 | 0 | NA | NA |
123IM | 2005 | 5 | buyout | europe |
123IM | 2006 | 5 | buyout | europe |
How would I go about replacing these NA values with the corresponding information for each investment firm?
Many thanks
Rory