Automating the generation of model output in R

Question

I have a list of R commands that first grabs data from SQL database through RODBC, computes calculated fields and then applies a regression model(assigned as "objModel" in my R environment)to the data. The final output is a csv file with two columns(Contact IDs, Probability_Score). How can I use Rscript to automate the process of running the script and retrieving a fresh csv file everyday? Does R have to be running the whole time? I work in Windows environment. I have no experience whatsoever with Rscript. Any extra detail in your answer is highly appreciated.

Likely duplicate: http://stackoverflow.com/q/2793389/2372064. If not please explain how your situation might be different. — MrFlick, Jul 27 '15 at 20:04
Are you on win, mac, or unix? If not win, then look into [cron](https://help.ubuntu.com/community/CronHowto) as a starting point for running "something" every day. The use of Rscript is relatively straight-forward: if your script file is named "daily-sql.R", then you would schedule `Rscript daily-sql.R`, making sure your *R* script adequately handles paths (and over-writing of files) and exceptions. — r2evans, Jul 27 '15 at 20:05
@r2evans Thanks. Should the instance of R keep running as the code is using "objModel" in my R environment.? — gibbz00, Jul 27 '15 at 20:11
I think you misunderstand: this implies that cron starts up one instance of Rscript at some time each day. Rscript runs your script, which loads/pulls all data it needs, does its processing, and then dumps out (to *somewhere*) your CSV output. After that, Rscript exits. There is no persistent process needed nor desired. Did @MrFlick's link help? Are you running this on windows? — r2evans, Jul 27 '15 at 20:14
@r2evans I am running this on windows. I am probably not understanding it right as my concern is that the script does not contain the code that created the variable "objModel". The variable "objmodel" was created by training a big dataset and it took lots of time. Now it resides in my global environment. My goal is to spit out new output using smaller datasets and this "objModel". When Rscript runs, does it need to know how "objModel" was created or is it sufficient for "objModel" to exist in the global environment? — gibbz00, Jul 27 '15 at 20:21
Have you tried saving `objModel` to an outside file (`save(objModel, file="objModel.Rdata")`) and have it loaded inside your script? — r2evans, Jul 27 '15 at 20:22
I have not. I did not know you can save a single object. I thought you can only save the entire workspace. Thank you so much! — gibbz00, Jul 27 '15 at 20:23
@r2evans Thanks again. I saved it as you have suggested. However, it printed NULL after saving. Is that a reason for concern? The saved file itself is 90MB, so something got saved! — gibbz00, Jul 27 '15 at 20:30
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84406/discussion-between-r2evans-and-gibbz00). — r2evans, Jul 27 '15 at 20:31

score 0 · Accepted Answer · edited May 23 '17 at 10:24

There are some unspecific conditions in this question that came out in comments and chat:

1. Object persistence

The objModel is a ~90MB model built over time with lots of data that does not need to be rebuilt each time.

With this in mind, your first task is to make that data persistent across different R sessions. This is most easily accomplished simply with save and load. In the current session:

save(objModel, func1, func2, file="objModel.Rdata") # assuming a relevant directory

This assumes that those are the only dependent objects in your environment on which the script relies. (It can take an arbitrary number of objects, so add as needed.)

Your stand-alone script should then be something like the following (which I'll refer to as myscript.R later):

# library(...) as necessary
load("objModel.Rdata")
# now have objModel, func1, func2 in your environment

# ...
# do your RODBC magic
# then your regression/prediction magic
# ...

write.csv(newresults, file="output.csv", header=......)

Tricks for the code:

Never assume human interaction.
Did you remember to save/load or recreate all needed variables? It's easy to forget about a variable in your current environment on which the script relies, and which will not be available in a vanilla Rscript environment.
Plotting anything? Use png, pdf, or something else to save directly to an image file(s).
Consider exception handling (tryCatch) if you have vital steps.
Regardless, you should have a simple check ("assertion") on the validity of the results before writing it to the output file, whether or not you are over-writing something. This may be simple as if (! is.null(...)) or if (length(...)).

2. Automation

Now, your actual question comes to play: how to automate this for daily unsupervised execution. Don't even look at cron or windows scheduling until you can do it successfully from a new command window.

That is, can you run Rscript /path/to/myscript.R and have it run successfully? You may need to either enforce the current working directory using setwd(...) or use absolute path names in your load and write.csv function calls. I tend to assume that the scheduler is going to run things from a different directory, so I force it one way or the others.

Once that works satisfactorily (did you check the output CSV to ensure it looks right?), only now should you consider actually automating it.

3. Automation, this time for real.

If you are on a unixy OS (you never specified), look at cron. Perhaps something like:

17 04 * * * /usr/bin/Rscript /path/to/myscript.R

which will run the command daily at 4:17am.

If on windows, @MrFlick's comment is appropriate: stackoverflow.com/q/2793389/2372064.

This is an awesome answer! Thanks again. – gibbz00 Jul 27 '15 at 21:38 — gibbz00, Jul 27 '15 at 21:38

Automating the generation of model output in R

1 Answers1

1. Object persistence

2. Automation

3. Automation, this time for real.