I have a Docker container that is setup to run an R script weekly via an Airflow DAG. The DAG has 3 events- 1 that is upstream of the Docker code which takes data from several databases, computes various features and then writes data to S3. This script reads in data from an S3 bucket, formats the data frame, runs a model to score records, then writes data back to S3. Finally there is downstream code that formats the output so that it can be loaded into Salesforce. The script worked while testing when I wrote and built it in December. Recently the run has failed several times with the error code:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
Calls: %>% ... mutate_impl -> ymd -> .parse_xxx -> unlist -> lapply -> FUN
Execution halted
Ok, so that seems to mean that the date it is reading in as a character is having an issue being formatted as a date. Since 'ymd' is in the chain I believe it to be the Lubridate function in the R script below.
The Docker file (code below) leverages an R image that has the Tidyverse because my code uses Dplyr and Lubridate. I could likely get by without Lubridate and use a base function to format the date, but more on that below
Docker file code:
FROM rocker/tidyverse
RUN mkdir -p /model
RUN apt-get update -qq && apt-get install -y \
libssl-dev \
libcurl4-gnutls-dev
RUN R -e "install.packages('caret')"
RUN R -e "install.packages('randomForest')"
RUN R -e "install.packages('lubridate')"
RUN R -e "install.packages('aws.s3')"
EXPOSE 80
EXPOSE 8787
COPY / /
ENTRYPOINT ["Rscript", "account_health_scoring.R"]
R script: I have to exclude the first few lines due to some identifying info and credentials, but the code first just reads in my S3 credentials from a file. Then, this code block runs and fails. There is a good deal of code downstream, but it all functions in the container:
require("dplyr")
require("caret")
require("aws.s3")
require("randomForest")
require("lubridate")
#set credentials
Sys.setenv("AWS_ACCESS_KEY_ID" = "key",
"AWS_SECRET_ACCESS_KEY" = "key")
#read in model file
s3load("rf_gridsearch.RData", bucket = "account-model")
#read in data
data<-read.csv(text = rawToChar(get_object((paste0("account_health_data_",
gsub("-", "_", as.character(Sys.Date()),
fixed=TRUE),".csv")),
bucket = "account-health-model-input")),
stringsAsFactors = FALSE)%>%
mutate(period=ymd(period))%>%
mutate_if(is.integer,as.numeric)
The reason for the 2 mutate lines is that despite being formatted as a POSIX timestamp, R coerces the date to a string AND coerces floats to integers. Perhaps I am missing something here as well in my read.csv or there is a better function for properly reading data, but this is what I have always used.
Questions:
What is the error message referring to/am I correct to think the YMD function is the culprit?
If so, how can I rewrite my code potentially using base functions to accomplish the same goal and avoid relying on a package.
Could it be package dependencies? In reviewing the logs it doesn't seem that this is the case as Lubridate imports several base functions/uses several. The package has not been updated since I wrote and tested this code.