-1

I have a problem with regard to inputing missing observations in a data frame with R, below is an snapshot of the data frame:

Sample of the data frame

enter image description here

I actually have 66 different districts, 21 days and each day and each district should have 144 time periods. The current dataset I have is with missing observations, e.g for district 5 at day 6, the observation values for time period 132 is missing.

What I am trying to achieve is to include the missing observations into the original data frame to make it complete, but for values of y1 and y2, I can just set them to null. How can I achieve this with R?

halfer
  • 19,824
  • 17
  • 99
  • 186
Felix Zhao
  • 459
  • 5
  • 9
  • Do you have the missing values ? If so, in what format? – Bryan Goggin Jun 08 '16 at 01:24
  • Post some minimal example R data frame with the way your data is and how you want it. It is not too hard to make up example data. That will help us actually write code that works for your case. – Gopala Jun 08 '16 at 01:27
  • Possible duplicate of [how to insert missing observations on a data frame](http://stackoverflow.com/questions/33003819/how-to-insert-missing-observations-on-a-data-frame) – alexwhitworth Jun 08 '16 at 21:15

2 Answers2

0

You haven't provided a reproducible example, so here's some basic guidance.

First, add rows for the missing values. Let's assume your data frame is called mydata and has columns District, DayOfMonth, and TimePeriod (plus y1, y2, etc.), but with some combinations of these values missing. Let's add in those missing combinations:

library(dplyr)

df = expand.grid(District=1:66, DayOfMonth=1:21, TimePeriod=1:144) %>%
  left_join(mydata)

You now have a data frame with all your original data, plus new rows with the previously absent combinations of District, DayOfMonth, and TimePeriod that are filled with NA in the y1, y2, etc. data columns. For imputation of these missing values, see, for example, the mi package and the mice package.

eipi10
  • 91,525
  • 24
  • 209
  • 285
0

Here is another option with expandand fill from tidyr

library(dplyr)
library(tidyr)
expand(District = 1:66, DayOfMonth = 1:12, TimePeriod = 1:144) %>%
            left_join(mydata) %>% 
            fill(., District, DayOfMonth, TimePeriod)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • HI akrun, thank you very much for your quick response. I tried with your code but got a warning that says:"data argument is missing, no default available." How can i fix this? – Felix Zhao Jun 08 '16 at 05:36