1

I have a data frame that has the following column: Tree ID, month, values. For some months, there is no recorded data, therefore those months do not exist in the data frame. I have completed the list with the missing months but now I do not know how to insert NA in the value column for the added months.

Example:

Tree.Id: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Month: Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec Values: 1,0,1,1,0,2,1,1,0,2

The following months are missing: Apr, Aug, I added them with the code below, and now I want for those 2 added months to introduce NA in the value column.

Here is what I tried:

tree_ls <- list()
  for (i in unique(data$Tree.ID)){
mon1 <-  data$month[data$Tree.ID == i]  ###  extract the month for every Tree iD
mon <- min(mon1, na.rm=T):max(mon1, na.rm=T) # completes the numbers with the missing month 
dat1 <- data$value[data$Tree.ID == i]
......

After this step, I do not know how to create a list that will add NA for all the added months that were missing, so I will have lists of the same length.

Thanks

Gabriela
  • 11
  • 3
  • It would be great if you could supply a minimal _reproducible example_ to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at [this SO post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example in R. – Eric Fail Jan 04 '16 at 17:24
  • 1
    I'm pretty sure you can achieve your actual goal with a merge/join operation. – Roland Jan 04 '16 at 17:29

2 Answers2

2

This is an old post, but I have a pretty good solution for this:

To begin, your small reproducible code should probably be the following:

month <- c(Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec)
value <- c(1,0,1,1,0,2,1,1,0,2)
df <- data.frame(id=id, month=month,value=value)
> head(df)
  id month value
1  1   Jan     1
2  2   Feb     0
3  3   Mar     1
4  4   May     1
5  5   Jun     0
6  6   Jul     2

Now just simply introduce an entire list of your domain, e.g., your months you want to obtain NA's where missing.

completeMonths <- c("Jan", "Feb", "Mar", "Apr","May", "Jun", "Jul","Aug", "Sept", "Oct", "Nov", "Dec")
df2 <- dataframe(month=completeMonths)
> df2
month
1    Jan
2    Feb
3    Mar
4    Apr
5    May
6    Jun
7    Jul
8    Aug
9   Sept
10   Oct
11   Nov
12   Dec

Now we have a column with all the underlying values, so when we merge, we can fill the missing rows as NA with the following syntax:

merge(df, df2, on=month, all=TRUE)

With our results as follows:

   month id value
1    Dec 10     2
2    Feb  2     0
3    Jan  1     1
4    Jul  6     2
5    Jun  5     0
6    Mar  3     1
7    May  4     1
8    Nov  9     0
9    Oct  8     1
10  Sept  7     1
11   Apr NA    NA
12   Aug NA    NA

Hope this helps, data wrangling sucks.

bmc
  • 817
  • 1
  • 12
  • 23
0

When you say that you have a data frame with some months that have "no recorded data" and therefore "do not exist", the fact that they are in the data frame at all means they have some representation. I'm going to guess that by "do not exist" you mean that they are blank strings, such as "". If that's the case, you can replace the blank strings with NA values using mutate in the dplyr package and ifelse in the base package as follows:

library(dplyr);
data_with_nas <- mutate(data, value = ifelse(value=="", NA, value));

That reads as "change the data data frame such that its value cells are replaced with NA if they were a blank string, or kept as is otherwise."

Mekki MacAulay
  • 1,727
  • 2
  • 12
  • 23
  • Maybe, I am wrong but I think that what OP is saying is that when no datum was recorded for a given month, there is just no column for that month. it is thus not just filling blank with NAs in columns. – tagoma Jan 04 '16 at 18:01
  • It's hard to tell, yeah. I was going on OP's first sentence, "I have a data frame that has the following column: Tree ID, month, values", which seems to suggest a 3-column `data frame` with separate columns for month and value. My guess could be wrong. OP will have to provide the data for us to work with in that case. – Mekki MacAulay Jan 04 '16 at 18:19