1

I am trying to replace NA with proceeding value in the same column using dplyr package, But I am not getting results. Below is script I am using.

data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))

data$ID <- paste(data$ID1, data$ID2, sep='_')

data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")

data1 <- data[order(data$ID, data$End.Date),]

library(tidyr)
library(dplyr)

data1 %>% 
  group_by(ID) %>% 
  fill(Start.date, .direction = 'up') %>% 
  fill(Start.date, .direction = 'down')

Also please see below data file which I am using in above script. Can anyone please help me to know why NA is not replaced in using above syntax. Script is not throwing error

ID1 ID2 Start date  End Date
1031    40038   7-Nov-16    1-Jan-17
1031    40037   12-Sep-16   2-Oct-16
1031    40033   15-Feb-16   2-Oct-16
1031    40033   15-Feb-16   3-Jul-16
1031    40038   7-Nov-16    4-Dec-16
1031    40035   18-Jul-16   4-Sep-16
1031    40033   15-Feb-16   4-Sep-16
1031    40043   23-Jan-17   5-Feb-17
1031    40038   7-Nov-16    5-Feb-17
1031    40042   18-Jan-17   5-Feb-17
1031    40033   15-Feb-16   5-Jun-16
1031    40044   17-Feb-17   5-Mar-17
1031    40043   23-Jan-17   5-Mar-17
1031    40037   12-Sep-16   6-Nov-16
1031    40035   NULL    7-Aug-16
1031    40033   15-Feb-16   7-Aug-16
1031    40036   NULL    7-Aug-16
1031    40038   7-Nov-16    8-Jan-17
1031    40045   28-Mar-17   9-Apr-17
1031    40033   15-Feb-16   9-Oct-16
1031    40037   12-Sep-16   9-Oct-16
1031    40033   15-Feb-16   10-Jul-16
1031    40038   7-Nov-16    11-Dec-16
1031    40033   15-Feb-16   11-Sep-16
1031    40043   23-Jan-17   12-Feb-17
1031    40033   15-Feb-16   12-Jun-16
1031    40043   23-Jan-17   12-Mar-17
1031    40044   17-Feb-17   12-Mar-17
1031    40037   12-Sep-16   13-Nov-16
1031    40038   7-Nov-16    13-Nov-16
1031    40033   15-Feb-16   14-Aug-16
1031    40035   18-Jul-16   14-Aug-16
1031    40038   7-Nov-16    15-Jan-17
1031    40045   28-Mar-17   16-Apr-17
1031    40033   15-Feb-16   16-Oct-16
1031    40037   12-Sep-16   16-Oct-16
1031    40033   15-Feb-16   17-Jul-16
1031    40038   7-Nov-16    18-Dec-16
1031    40033   15-Feb-16   18-Sep-16
1031    40037   12-Sep-16   18-Sep-16
1031    40043   23-Jan-17   19-Feb-17
1031    40033   15-Feb-16   19-Jun-16
1031    40043   23-Jan-17   19-Mar-17
1031    40038   7-Nov-16    20-Nov-16
1031    40037   12-Sep-16   20-Nov-16
1031    40035   18-Jul-16   21-Aug-16
1031    40033   15-Feb-16   21-Aug-16
1031    40039   9-Jan-17    22-Jan-17
1031    40038   7-Nov-16    22-Jan-17
1031    40033   15-Feb-16   22-May-16
1031    40045   28-Mar-17   23-Apr-17
1031    40037   12-Sep-16   23-Oct-16
1031    40033   15-Feb-16   23-Oct-16
1031    40033   15-Feb-16   24-Jul-16
1031    40038   7-Nov-16    25-Dec-16
1031    40033   15-Feb-16   25-Sep-16
1031    40037   12-Sep-16   25-Sep-16
1031    40043   23-Jan-17   26-Feb-17
1031    40044   17-Feb-17   26-Feb-17
1031    40033   15-Feb-16   26-Jun-16
1031    40043   23-Jan-17   26-Mar-17
1031    40037   12-Sep-16   27-Nov-16
1031    40038   7-Nov-16    27-Nov-16
1031    40033   15-Feb-16   28-Aug-16
1031    40035   18-Jul-16   28-Aug-16
1031    40038   7-Nov-16    29-Jan-17
1031    40042   18-Jan-17   29-Jan-17
1031    40033   15-Feb-16   29-May-16
1031    40037   12-Sep-16   30-Oct-16
1031    40036   8-Jul-16    31-Jul-16
1031    40033   15-Feb-16   31-Jul-16
1031    40035   18-Jul-16   31-Jul-16
user3734568
  • 1,311
  • 2
  • 22
  • 36
  • 2
    Possible duplicate of [Replace NA with values in another row of same column for each group in r](https://stackoverflow.com/questions/31879390/replace-na-with-values-in-another-row-of-same-column-for-each-group-in-r) – Matt Jun 09 '17 at 14:29
  • I just checked your code, and the resulting `data1 ` has no NA's in it. – ssp3nc3r Jun 09 '17 at 17:13
  • 1
    I think the more canonical dupe in this case is this one: [Using dplyr window-functions to make trailing values (fill in NA values)](https://stackoverflow.com/q/23340150/903061), but I agree with ssp3nc3r that the NA-replacing code looks fine, OP should work on the importing code and make sure NA values are present where they should be. – Gregor Thomas Jun 09 '17 at 17:14
  • @Gregor, Thanks a lot for your reply able to replace NA with using information which you shared. – user3734568 Jun 13 '17 at 12:18

1 Answers1

1

The zoo library has a great function for this called na.locf()

library(zoo)
library(dplyr)

data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))

data$ID <- paste(data$ID1, data$ID2, sep='_')

data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")

data1 <- data[order(data$ID, data$End.Date),]

data1 <- data1 %>% 
         group_by(ID) %>% 
         mutate(Start.Date = ifelse(is.na(Start.Date),na.locf(Start.Date, na.rm = FALSE), Start.Date))
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
  • 1
    It's unusual to run `na.locf` inside an `ifelse(is.na())` construct. OP is trying to use `tidyr::fill`, which is very similar to `zoo::na.locf` - I'm not sure that replacing `fill` with `na.locf` is likely to fix any problems... – Gregor Thomas Jun 09 '17 at 17:12
  • @Matt, Thanks for sharing script. I tried your script, was able to replace data for missing values in start.date, but unfortunately it changed format of column to numeric. Not sure why it changed format of column. – user3734568 Jun 13 '17 at 12:12