0

I'd like to make dataframe with large number of row to copy data from another with missing data.

df.org
                  time    t    h  p   s
1  2016-10-30 10:10:00 33.6 21.3 NA STA
2  2016-10-30 10:50:00 33.7 19.8 NA STA
3  2016-10-30 11:00:00 33.7 18.4 NA STB
4  2016-10-30 11:10:00 34.3 19.3 NA STB
5  2016-10-30 11:20:00 33.9 19.4 NA STB
6  2016-10-30 11:30:00 34.4 20.9 NA STA
7  2016-10-30 11:40:00 34.8 21.1 NA STB
8  2016-10-30 11:50:00 34.6 21.2 NA STB
9  2016-10-30 12:00:00 34.6 22.1 NA STA
10 2016-10-30 12:10:00 34.9 20.8 NA STC
11 2016-10-30 12:20:00 34.9 21.7 NA STC
12 2016-10-30 12:30:00 35.0 21.9 NA STA
13 2016-10-30 12:50:00 35.1 22.6 NA STA

This is what I expected.

df.wNA
                  time     t     h     p      s
1  2016-10-30 10:10:00  33.6  21.3    NA    STA
2  2016-10-30 10:20:00    NA    NA    NA     NA
3  2016-10-30 10:30:00    NA    NA    NA     NA
4  2016-10-30 10:40:00    NA    NA    NA     NA
5  2016-10-30 10:50:00  33.7  19.8    NA    STA
6  2016-10-30 11:00:00  33.7  18.4    NA    STB
7  2016-10-30 11:10:00  34.3  19.3    NA    STB
8  2016-10-30 11:20:00  33.9  19.4    NA    STB
9  2016-10-30 11:30:00  34.4  20.9    NA    STA
10 2016-10-30 11:40:00  34.8  21.1    NA    STB
11 2016-10-30 11:50:00  34.6  21.2    NA    STB
12 2016-10-30 12:00:00  34.6  22.1    NA    STA
13 2016-10-30 12:10:00  34.9  20.8    NA    STC
14 2016-10-30 12:20:00  34.9  21.7    NA    STC
15 2016-10-30 12:30:00  35.0  21.9    NA    STA
16 2016-10-30 12:40:00    NA    NA    NA     NA
17 2016-10-30 12:50:00  35.1  22.6    NA    STA

code

time <- as.POSIXct(c("2016-10-30 10:10:00", "2016-10-30 10:50:00", "2016-10-30 11:00:00", "2016-10-30 11:10:00", "2016-10-30 11:20:00", "2016-10-30 11:30:00", "2016-10-30 11:40:00", "2016-10-30 11:50:00", "2016-10-30 12:00:00", "2016-10-30 12:10:00", "2016-10-30 12:20:00", "2016-10-30 12:30:00", "2016-10-30 12:50:00"))
t <- c( 33.6, 33.7, 33.7, 34.3, 33.9, 34.4, 34.8, 34.6, 34.6, 34.9, 34.9, 35.0, 35.1 )
h <- c( 21.3, 19.8, 18.4, 19.3, 19.4, 20.9, 21.1, 21.2, 22.1, 20.8, 21.7, 21.9, 22.6 )
p <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
s <- c( "STA", "STA", "STB", "STB", "STB", "STA", "STB", "STB", "STA", "STC", "STC", "STA", "STA" ) 

df.org <- data.frame(time, t, h, p, s)
fr <- min(df.org$time)
to <- max(df.org$time)
times <- as.POSIXct(seq(fr, to, by=60*10))  
df.wNA <- subset(df.org, FALSE)
for (jth in 1:length(times)) {
  ro <- as.data.frame(lapply(df.org[1, ], function(x) { rep(NA, length(x)) } ))
  ro$time <- times[jth]
  df.wNA <- bind_rows(df.wNA, ro)
}

df.wNA[pmatch(df.org$time, df.wNA$time, nomatch=0), ] <- df.org

But this is too slow in case of length(times) is large. How I can speed up this?

Thanks

  • Please give a data example, and what you're expecting so we can try to help you. Look at the package `data.table` or `plyr, to increase performance – timat Oct 30 '16 at 09:41
  • Welcome to SO. You could improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit & improve it accordingly. A good post usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able in a new/clean R session. Your example is neither minimal (`for (jth in 1:10000000)`) nor is it reproducible (_"Error in subset(df.org, FALSE) : object 'df.org' not found"_). – lukeA Oct 30 '16 at 09:53
  • thanks #timat #lukeA – Bongju Lee Oct 30 '16 at 10:44

0 Answers0