1

I've been trying to get padr to work with my dataset without much success, although I can get the examples to work:

# I have a few datetime columns so I convert all to POSIXct with UTC.  
> df <- mutate_at(DATABASE, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df <- as_tibble(df)
> head(df, 20)
# A tibble: 20 x 2
             charttime   sbp
                <dttm> <dbl>
 1 2101-10-20 22:30:01    NA
 2 2101-10-20 18:45:00    62
 3 2101-10-20 19:00:00    66
 4 2101-10-20 19:12:00    NA
 5 2101-10-20 19:14:00    NA
 6 2101-10-20 19:15:00   217
 7 2101-10-20 19:26:00    NA
 8 2101-10-20 19:30:00   102
 9 2101-10-20 19:45:00    94
10 2101-10-20 19:59:00    NA
11 2101-10-20 20:00:00    80
12 2101-10-20 20:04:00    NA
13 2101-10-20 20:15:00    91
14 2101-10-20 20:30:00    86
15 2101-10-20 20:45:00    96
16 2101-10-20 21:00:00    73
17 2101-10-20 21:15:00    84
18 2101-10-20 21:30:00    96
19 2101-10-20 21:45:00   100
20 2101-10-20 21:51:00    NA

> df$charttime %>% get_interval # should say 'sec'
[1] "sec"

> df %>% thicken(interval='hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : 
  missing value where TRUE/FALSE needed

But with the padr example it works:

> coffee %>% thicken(interval='day')
           time_stamp amount time_stamp_day
1 2016-07-07 03:11:21   3.14     2016-07-07
2 2016-07-07 03:46:48   2.98     2016-07-07
3 2016-07-09 07:25:17   4.11     2016-07-09
4 2016-07-10 04:45:11   3.14     2016-07-10
> coffee$time_stamp %>% get_interval # should say 'sec'
[1] "sec"

I haven't been able to figure out why my dataset isn't working and how to interpret the error.

Update 1

Here is another, more complete example of what I'm trying to do. I also include a csv with a small snippet of data that I'm working with so this problem is more reproducible. I've tried this on two machines and I get the same result.

You will notice that in the example above and the example below, the first value of charttime is different. (2101-10-20 22:30:01 changes to 2101-10-20 22:30:00). I wanted to have the interval as 'sec' instead of 'min' so I manually changed the value. Either way results in the same problem.

padr_data.csv

> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
> 
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))))
> df$sbp <- as.numeric(df$sbp)
> summary(df)
   charttime                        sbp       
 Min.   :2101-10-20 18:30:00   Min.   : 62.0  
 1st Qu.:2101-10-20 19:33:45   1st Qu.: 84.5  
 Median :2101-10-20 20:52:30   Median : 95.0  
 Mean   :2101-10-20 21:08:22   Mean   :100.9  
 3rd Qu.:2101-10-20 22:26:15   3rd Qu.:102.0  
 Max.   :2101-10-21 00:42:00   Max.   :217.0  
                               NA's   :12     
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt" 

$sbp
[1] "numeric"

> df$charttime %>% get_interval
[1] "min"
> 
> # this does not work
> df[!is.na(df$charttime),] %>%
+    thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : 
  missing value where TRUE/FALSE needed
> 
> # this does not work
> df %>%
+   thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : 
  missing value where TRUE/FALSE needed
Daren Eiri
  • 137
  • 12

2 Answers2

1

It seems that padr does not work well with dates set in the future! More specifically, a date that is more than 20 years in the future will not work. I will open up an issue with the padr developer to see how the code can be improved.

> packageVersion("tidyverse")
[1] ‘1.1.1’
> packageVersion("lubridate")
[1] ‘1.6.0’
> packageVersion("padr")
[1] ‘0.3.0’
> library(tidyverse)
> library(lubridate)
> library(padr)
> 
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(63)))
> 
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
> 
> summary(df)
   charttime                        sbp       
 Min.   :2038-11-04 18:30:00   Min.   : 62.0  
 1st Qu.:2038-11-04 19:33:45   1st Qu.: 84.5  
 Median :2038-11-04 20:52:30   Median : 95.0  
 Mean   :2038-11-04 21:08:22   Mean   :100.9  
 3rd Qu.:2038-11-04 22:26:15   3rd Qu.:102.0  
 Max.   :2038-11-05 00:42:00   Max.   :217.0  
                               NA's   :12     
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt" 

$sbp
[1] "numeric"

> df$charttime %>% get_interval
[1] "min"
> 
> # this does not work
> df[!is.na(df$charttime),] %>%
+   thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In round_down_core(a, b) : NAs introduced by coercion to integer range
2: In round_down_core(a, b) : NAs introduced by coercion to integer range

Change dyears(63) to dyears(64)

> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(64)))
> 
> df$sbp <- as.numeric(df$sbp)
> #df <- na.omit(df)
> 
> summary(df)
   charttime                        sbp       
 Min.   :2037-11-04 18:30:00   Min.   : 62.0  
 1st Qu.:2037-11-04 19:33:45   1st Qu.: 84.5  
 Median :2037-11-04 20:52:30   Median : 95.0  
 Mean   :2037-11-04 21:08:22   Mean   :100.9  
 3rd Qu.:2037-11-04 22:26:15   3rd Qu.:102.0  
 Max.   :2037-11-05 00:42:00   Max.   :217.0  
                               NA's   :12     
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt" 

$sbp
[1] "numeric"

> df$charttime %>% get_interval
[1] "min"
> 
> # this does work
> df[!is.na(df$charttime),] %>%
+   thicken(interval = 'hour')
             charttime sbp      charttime_hour
1  2037-11-04 18:30:00  NA 2037-11-04 18:00:00
2  2037-11-04 18:45:00  62 2037-11-04 18:00:00
3  2037-11-04 19:00:00  66 2037-11-04 19:00:00
4  2037-11-04 19:12:00  NA 2037-11-04 19:00:00
5  2037-11-04 19:14:00  NA 2037-11-04 19:00:00
6  2037-11-04 19:15:00 217 2037-11-04 19:00:00
7  2037-11-04 19:26:00  NA 2037-11-04 19:00:00
8  2037-11-04 19:30:00 102 2037-11-04 19:00:00
9  2037-11-04 19:45:00  94 2037-11-04 19:00:00
10 2037-11-04 19:59:00  NA 2037-11-04 19:00:00
11 2037-11-04 20:00:00  80 2037-11-04 20:00:00
12 2037-11-04 20:04:00  NA 2037-11-04 20:00:00
13 2037-11-04 20:15:00  91 2037-11-04 20:00:00
14 2037-11-04 20:30:00  86 2037-11-04 20:00:00
15 2037-11-04 20:45:00  96 2037-11-04 20:00:00
16 2037-11-04 21:00:00  73 2037-11-04 21:00:00
17 2037-11-04 21:15:00  84 2037-11-04 21:00:00
18 2037-11-04 21:30:00  96 2037-11-04 21:00:00
19 2037-11-04 21:45:00 100 2037-11-04 21:00:00
20 2037-11-04 21:51:00  NA 2037-11-04 21:00:00
21 2037-11-04 22:00:00  NA 2037-11-04 22:00:00
22 2037-11-04 22:15:00 123 2037-11-04 22:00:00
23 2037-11-04 22:30:00 125 2037-11-04 22:00:00
24 2037-11-04 22:45:00 132 2037-11-04 22:00:00
25 2037-11-04 23:00:00  88 2037-11-04 23:00:00
26 2037-11-04 23:15:00  NA 2037-11-04 23:00:00
27 2037-11-04 23:45:00  NA 2037-11-04 23:00:00
28 2037-11-05 00:00:00 102 2037-11-05 00:00:00
29 2037-11-05 00:28:00  NA 2037-11-05 00:00:00
30 2037-11-05 00:42:00  NA 2037-11-05 00:00:00
Daren Eiri
  • 137
  • 12
0

SOLUTION 1 - DID NOT WORKED

Don't know very well the package but I would try two things:

  1. filtering NA values
  2. declaring by argument

Try this

df[!is.na(df$sbp),] %>% thicken(interval='hour', by = 'charttime')

SOLUTION 2 - DID NOT WORKED

try coercing df into a data frame instead of a tibble, also try coercing charttime into date afterwards:

df <- data.frame(df)
df$charttime <- as.POSIXct(df$charttime)

SOLUTION 3 - DID NOT WORKED

You may have some NAs on your charttime, try this:

df[!is.na(df$charttime),] %>% thicken(interval = 'hour')

I tried renaming variable but that is not the problem. Sorry but I cannot comment yet. Please tell me if it worked.

  • result is still the same: `df[!is.na(df$sbp),] %>% thicken(interval='hour', by = 'charttime')` `Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : missing value where TRUE/FALSE needed` – Daren Eiri Oct 11 '17 at 00:17
  • Coerce your tibble into a data frame. – Vitor Bianchi Lanzetta Oct 11 '17 at 00:48
  • I appreciate your second solution. I coerced my tibble into a dataframe but it still resulted in the same error. Note that padr works in conjunction with dplyr so I would assume tibbles are acceptable. The coffee example is also a tibble (https://www.r-bloggers.com/introducing-padr/). Regarding charttime, it was already a POSIXct. – Daren Eiri Oct 11 '17 at 01:00
  • When I tried class(coffee) it showed only data frame to me. Code is expecting some boolean coming from to_date, but I could not track such computation into function code. – Vitor Bianchi Lanzetta Oct 11 '17 at 01:05
  • Your time zone (`tz`) is defined? If not try to coerce your `charttime` again using `as.POSIXct(x, tz = 'EST')`, then set `df %>% thicken(interval='hour', start_val = as.POSIXct('2101-10-20 22:30:01', tz = 'EST'))` – Vitor Bianchi Lanzetta Oct 11 '17 at 01:51
  • Still no dice. I actually do coerce charttime to POSIXct with UTC as the tz but still haven't had any luck. Let me know if you think of anything else! – Daren Eiri Oct 11 '17 at 05:37
  • This is a hard one. We now for sure what is not the problem. Can you make your data set available or at least a demo version? One way to do it is to publish you data as .csv page with your Google Drive account and sharing the URL. Hope you find the solution. – Vitor Bianchi Lanzetta Oct 11 '17 at 05:53
  • Again, I appreciate your effort in trying to figure this problem out @Vitor! I have updated my dataset as a csv if you would like to think about this more, and include more details about my data. – Daren Eiri Oct 11 '17 at 17:11