This problem has me baffled; I'm not an experience R user so what I've done may not be elegant but it's not complicated and I don't understand the problem.
I begin with a simple data frame that has 6 columns and several hundred rows. The data columns are Year, Month, Day, and three numeric variables. There may be several rows that have the same values of Year, Month, and Day. Here is an example:
> thisFrame
Year Month Day trans_fac dist var
1 2003 3 23 42.3475 1.858 1.48190
2 2003 3 23 42.3475 2.779 1.42260
3 2003 3 23 42.3475 4.145 1.39150
4 2003 3 23 42.3475 5.069 1.37860
5 2003 3 23 42.3475 6.439 1.42050
6 2003 3 23 42.3475 8.736 2.54290
7 2003 3 23 42.3475 9.661 1.29120
8 2003 3 23 42.3475 11.040 1.24360
9 2003 3 23 42.3475 11.960 1.32190
10 2003 3 23 42.3475 13.340 1.34820
11 2003 3 23 42.3475 14.270 1.34630
12 2003 3 23 42.3475 15.640 1.37820
13 2003 3 23 42.3475 16.570 1.39550
[some rows snipped]
24 2003 3 23 42.3475 29.840 1.09530
25 2003 4 11 42.3475 2.091 2.62980
26 2003 4 11 42.3475 3.557 1.61910
27 2003 4 11 42.3475 5.446 1.03760
28 2003 4 11 42.3475 7.099 0.93600
29 2003 4 11 42.3475 8.798 1.02190
30 2003 4 11 42.3475 10.630 1.03940
31 2003 4 11 42.3475 12.240 0.96743
32 2003 4 11 42.3475 14.110 0.95497
Because I want to operate on each day's data independently, I calculate the Julian(unix) day for each row and add the variable jdays to the data frame and then find the unique days.
days <- as.Date(ISOdate(thisFrame$Year,thisFrame$Month,thisFrame$Day))
thisFrame$jdays <- as.integer(days)
uniq_days <- unique(thisFrame$jdays)
nudays <- length(uniq_days) # number of unique days
I then loop through the number of unique days and create new data frames by subsetting the original frame based on the unique day. Then I want to print the results of the operation along with the day, month, and year of the input set. Should be simple, right? Well, the results are exactly what I want sometimes and sometimes, I get NA values for the day, month, and year even though they are present in the subset. I've tried this with several different input data frames and haven't found any pattern that would help me understand why this is happening.
for (i in 1:nudays) {
thisSet <- thisFrame[thisFrame$jdays == uniq_days[i],]
print(thisSet)
print(c(i, thisSet$Day[i], thisSet$Month[i], thisSet$Year[i])
}
Expected result:
[1] "This is subset 1"
Year Month Day trans_fac dist var jdays
1 2003 3 23 44.4335 2.011 1.12240 12134
2 2003 3 23 44.4335 3.180 0.92435 12134
3 2003 3 23 44.4335 4.147 0.95406 12134
[lines snipped]
28 2003 3 23 44.4335 29.870 0.75302 12134
[1] 1 3 23 2003
[1] "This is subset 2"
Year Month Day trans_fac dist var jdays
29 2003 3 26 44.4335 3.514 1.01300 12137
30 2003 3 26 44.4335 5.275 0.74062 12137
31 2003 3 26 44.4335 7.031 0.67548 12137
[lines snipped]
45 2003 3 26 44.4335 31.220 0.58399 12137
[1] 2 3 26 2003
etc. Until we get to
[1] "This is subset 18"
Year Month Day trans_fac dist var jdays
358 2003 8 18 44.4335 2.075 0.85803 12282
359 2003 8 18 44.4335 3.524 0.71728 12282
[lines snipped]
374 2003 8 18 44.4335 30.320 0.76502 12282
[1] 18 NA NA NA
but then, we back to expected behavior
[1] "This is subset 19"
Year Month Day trans_fac dist var jdays
375 2003 8 19 44.4335 2.475 1.17220 12283
376 2003 8 19 44.4335 3.875 0.87088 12283
[lines snipped]
397 2003 8 19 44.4335 30.070 0.76463 12283
[1] 19 8 19 2003
Until we get to
[1] "This is subset 21"
Year Month Day trans_fac dist var jdays
418 2003 9 2 44.4335 1.781 2.00410 12297
419 2003 9 2 44.4335 3.783 0.96007 12297
420 2003 9 2 44.4335 5.479 0.85195 12297
[lines snipped]
433 2003 9 2 44.4335 28.530 0.89522 12297
[1] 21 NA NA NA
And back to expected result
[1] "This is subset 24"
Year Month Day trans_fac dist var jdays
464 2003 9 17 44.4335 1.173 1.80490 12312
465 2003 9 17 44.4335 2.587 1.04510 12312
[lines snipped]
487 2003 9 17 44.4335 29.770 0.82791 12312
[1] 24 9 17 2003
and so on.
I'm not seeing the problem and would appreciate any advice. Thanks.