I am using a nested data approach to apply a censored data model to stream clarity data in 1275 streams ~225,000 observations). I have successfully used group_by
to group the data set to three hierarchal levels ( HUC4, Major watershed, and Stream… think country, state, county). I want to pursue this approach as it appears to be vastly faster and easier to read than the for-loop approach I have been using. Howerver I am getting the error: NA/NaN/Inf in foreign function call
, when I map the model to the nested data frame. This is extremely puzzling, since the approach works fine when I apply it to the large and middle sized group_by data frames. Also it is odd since the list elements in each of the three group_by data frames are identical (just grouped at different levels). The data are large and unwieldy, but I can try and give some clues as to the structure.
The starting data look like this:
> summary(tb_cens)
huc4 loc_major_basin sys_loc_code sample_date y m
Length:203631 Min. : 4010101 Length:203631 Min. :1998-04-06 Min. :1998 Min. : 1.000
Class :character 1st Qu.: 7010207 Class :character 1st Qu.:2006-05-27 1st Qu.:2006 1st Qu.: 5.000
Mode :character Median : 7020011 Mode :character Median :2009-09-10 Median :2009 Median : 7.000
Mean : 7193116 Mean :2009-10-29 Mean :2009 Mean : 6.676
3rd Qu.: 7040004 3rd Qu.:2013-08-28 3rd Qu.:2013 3rd Qu.: 8.000
Max. :10230003 Max. :2018-10-23 Max. :2018 Max. :12.000
d doy combined_stube_conv100_conv60 detection_limit record_length censored1
Min. : 1.00 Min. : 1.0 Min. : 0.00 TRUE : 80189 Min. :10.00 Mode :logical
1st Qu.: 9.00 1st Qu.:143.0 1st Qu.: 26.00 FALSE:123442 1st Qu.:12.00 FALSE:159845
Median :16.00 Median :184.0 Median : 58.57 Median :14.00 TRUE :43786
Mean :16.02 Mean :187.8 Mean : 53.29 Mean :14.48
3rd Qu.:24.00 3rd Qu.:233.0 3rd Qu.: 72.00 3rd Qu.:17.00
Max. :31.00 Max. :365.0 Max. :100.00 Max. :26.00
censored2
Mode :logical
FALSE:167033
TRUE :36598
In my case the commands are
##### create the model function
cens_model <- function(tb_cens) {
survreg(Surv(left_clarity, right_clarity, type = 'interval2') ~ y + m, data = tb_cens, dist = 'gaussian')
}
##### group_by huc4 (12 huc4s)
by_huc4 %
group_by(huc4) %>%
nest()
# apply censored data model to each huc4 and mutate results to data frame
by_huc4 %
mutate(huc_model = map(data, cens_model))
by_huc4
Which works perfectly! Also,
##### group_by watershed (75 major watersheds)
by_watershed %
group_by(loc_major_basin) %>%
nest()
# apply censored data model to each watershed and mutate results to data frame
by_watershed %
mutate(watershed_model = map(data, cens_model))
by_watershed
Which also works perfectly! However, trying the same technique on streams (smallest group_by level) throws an error about NA/NaN/Inf in foreign function call.
##### group_by stream
by_stream %
group_by(sys_loc_code) %>%
nest()
# apply censored data model to each watershed and mutate results to data frame
by_stream %
mutate(stream_model = map(data, cens_model))
by_stream
This gives the following error:
Error in mutate_impl(.data, dots) :
Evaluation error: NA/NaN/Inf in foreign function call (arg 3).
There are no NAs or NaNs in my data. There are some Inf in the the final column but the Tobit model required those as they specify right censored data (And the map function worked perfectly with the largest and middle group_by levels. It only had trouble when I grouped by the stream level).
Does anyone have ideas about trying to run it to ground. Any thoughts would be much appreciated