I'm trying to find missing minutes in my time-series-dataset. I wrote an R code for a local performance on a small sample:
test <- dfv %>% mutate(timestamp = as.POSIXct(DaySecFrom.UTC.)) %>%
complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = 'min'), ElemUID)
But you can't use complete()
from tidyr on a spark_tbl.
Error in UseMethod("complete_") :
no applicable method for 'complete_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"
here is some test-data:
ElemUID ElemName Kind Number DaySecFrom(UTC) DaySecTo(UTC)
399126817 A648/13FKO-66 DEZ 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
483492732 A661/18FRS-97 DEZ 120.00 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
399126819 A648/12FKO-2 DEZ 60.00 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
399126818 A648/12FKO-1 DEZ 180.00 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
399126816 A648/13FKO-65 DEZ 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
398331142 A661/31OFN-1 DEZ 120.00 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
398331143 A661/31OFN-2 DEZ 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
483492739 A5/28FKN-65 DEZ 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
483492735 A661/23FRS-97 DEZ 60.00 2017-07-01 23:58:00.000 2017-07-01 23:59:00.000
Is there any other way or work-around to solve this task on a spark-cluster in R? I would be really happy for your help!