3

in R i have a data frame "closeValues" it is as follows

>closeValues
            date        value
    1  1980-12-10       5
    2  1980-12-15       8
    3  1980-12-18       7
    4  1980-12-20       1

but i need to fill the value for the field "value" with previous value if the "date" is missing. Actually i need the following output

>closeValues
   date        value
1  1980-12-10       5
2  1980-12-11       5
3  1980-12-12       5
4  1980-12-13       5
5  1980-12-14       5
6  1980-12-15       8
7  1980-12-16       8
8  1980-12-17       8
9  1980-12-18       7
10 1980-12-19       7
11 1980-12-20       1

is it possible in R?

Dinoop Nair
  • 2,663
  • 6
  • 31
  • 51

2 Answers2

3

Using na.locf from zoo package, you can do this :

dat1 <- data.frame(date = seq(as.Date('1980-12-10'),as.Date('1980-12-20'),1))
## the merge will fill dat1 with NA, and na.locf do the rest 
na.locf(zoo(merge(dat1,dat,all.x=T)))
   date       value
1  1980-12-10  5   
2  1980-12-11  5   
3  1980-12-12  5   
4  1980-12-13  5   
5  1980-12-14  5   
6  1980-12-15  8   
7  1980-12-16  8   
8  1980-12-17  8   
9  1980-12-18  7   
10 1980-12-19  7   
11 1980-12-20  1   

PS please provide a reproducible example next time. Yu can write this:

  dat <- data.frame(date = as.Date(c('1980-12-10','1980-12-15',
                                   '1980-12-18','1980-12-20')), 
                    value=c(5,8,7,1))

Or

dput(dat)
structure(list(date = structure(c(3996, 4001, 4004, 4006), class = "Date"), 
    value = c(5, 8, 7, 1)), .Names = c("date", "value"), row.names = c(NA, 
-4L), class = "data.frame")
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

This might do what you want in base R:

df.1 <- read.table(text='
            DATE   VALUE
      1980-12-10       5
      1980-12-15       8
      1980-12-18       7
      1980-12-20       1', header=T, colClasses=c('character', 'numeric'))

df.1$DATE2 <- as.Date(df.1$DATE)

df.1$diffs <- c(as.numeric(diff(df.1$DATE2)),1)

df.2 <- df.1[rep(1:nrow(df.1),df.1$diffs),]

df.2$running.count = sequence(rle(df.2$VALUE)$lengths)

df.2$DATE3 <- df.2$DATE2 + (df.2$running.count-1)
df.2

#           DATE VALUE      DATE2 diffs running.count      DATE3
# 1   1980-12-10     5 1980-12-10     5             1 1980-12-10
# 1.1 1980-12-10     5 1980-12-10     5             2 1980-12-11
# 1.2 1980-12-10     5 1980-12-10     5             3 1980-12-12
# 1.3 1980-12-10     5 1980-12-10     5             4 1980-12-13
# 1.4 1980-12-10     5 1980-12-10     5             5 1980-12-14
# 2   1980-12-15     8 1980-12-15     3             1 1980-12-15
# 2.1 1980-12-15     8 1980-12-15     3             2 1980-12-16
# 2.2 1980-12-15     8 1980-12-15     3             3 1980-12-17
# 3   1980-12-18     7 1980-12-18     2             1 1980-12-18
# 3.1 1980-12-18     7 1980-12-18     2             2 1980-12-19
# 4   1980-12-20     1 1980-12-20     1             1 1980-12-20
Mark Miller
  • 12,483
  • 23
  • 78
  • 132