1

I need some help filling cells which have an 'NA' values with other values which are already present in the surrounding rows.

I currently have a panel dataset of investors and their activities. Some of the rows were missing, so I have completed the panel to include these rows, replacing the financial deal information with '0' values.

The other variables relate to wider firm characteristics, such as region and strategy. I am unsure how to replicate these for each firm.

This is my code so far.

df <- df %>%
  group_by(investor) %>%
  mutate(min = min(dealyear, na.rm = TRUE),
         max = max(dealyear, na.rm = TRUE)) %>%
  complete(investor, dealyear = min:max, fill = list(counttotal=0, countgreen=0, countbrown=0)) %>% 

An example of data before completion - notice year 2004 is missing.

investor dealyear dealcounts strategy region
123IM 2002 5 buyout europe
123IM 2003 5 buyout europe
123IM 2005 5 buyout europe
123IM 2006 5 buyout europe

Example of data after completion, with missing row added in

investor dealyear dealcounts strategy region
123IM 2002 5 buyout europe
123IM 2003 5 buyout europe
123IM 2004 0 NA NA
123IM 2005 5 buyout europe
123IM 2006 5 buyout europe

How would I go about replacing these NA values with the corresponding information for each investment firm?

Many thanks

Rory

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
R Wd
  • 59
  • 5

1 Answers1

0

You may use complete with group_by as -

library(dplyr)
library(tidyr)

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear), 
           fill = list(dealcounts = 0)) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 NA       NA    
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe

If you want to replace NA in strategy and region column you may use fill.

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear), 
           fill = list(dealcounts = 0)) %>%
  fill(strategy, region) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 buyout   europe
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213