Replacing NA values with values from neighbouring rows

Question

I need some help filling cells which have an 'NA' values with other values which are already present in the surrounding rows.

I currently have a panel dataset of investors and their activities. Some of the rows were missing, so I have completed the panel to include these rows, replacing the financial deal information with '0' values.

The other variables relate to wider firm characteristics, such as region and strategy. I am unsure how to replicate these for each firm.

This is my code so far.

df <- df %>%
  group_by(investor) %>%
  mutate(min = min(dealyear, na.rm = TRUE),
         max = max(dealyear, na.rm = TRUE)) %>%
  complete(investor, dealyear = min:max, fill = list(counttotal=0, countgreen=0, countbrown=0)) %>%

An example of data before completion - notice year 2004 is missing.

investor	dealyear	dealcounts	strategy	region
123IM	2002	5	buyout	europe
123IM	2003	5	buyout	europe
123IM	2005	5	buyout	europe
123IM	2006	5	buyout	europe

Example of data after completion, with missing row added in

investor	dealyear	dealcounts	strategy	region
123IM	2002	5	buyout	europe
123IM	2003	5	buyout	europe
123IM	2004	0	NA	NA
123IM	2005	5	buyout	europe
123IM	2006	5	buyout	europe

How would I go about replacing these NA values with the corresponding information for each investment firm?

Many thanks

Rory

score 0 · Accepted Answer · answered Jul 13 '21 at 11:53

You may use complete with group_by as -

library(dplyr)
library(tidyr)

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear), 
           fill = list(dealcounts = 0)) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 NA       NA    
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe

If you want to replace NA in strategy and region column you may use fill.

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear), 
           fill = list(dealcounts = 0)) %>%
  fill(strategy, region) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 buyout   europe
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe

Replacing NA values with values from neighbouring rows

1 Answers1