1

I am working on a large R project to perform different analyses of a common dataset. I have built up several individual scripts for each analysis, as well as high-level scripts to call each one in sequence. Each script starts by calling an init.R script that wipes the memory ( rm(list=ls(all=TRUE)) ).

I have recently discovered that summary() (and, I think coef()) produces different output, depending on the order of the scripts. In scripts that fit models using lm() or gam() (mgcv package), if these are run first, in a "fresh" R session, the summary() output lists factors with the full labels.

However, if I run other scripts first, which use simple nested aov() functions and produce some graphs and other output using some other packages, then re-run the previously-mentioned scripts, summary() instead produces output with factor levels labeled using numbers (the 'coded' values, not the actual factor level labels).

This is not something I can easily "reproduce" using a minimal working example, unfortunately, because I haven't quite pinpointed where in my scripts this behaviour changes. I have confirmed a few things in quick tests:

  • memory is cleared in between scripts using rm(list=ls()), so there shouldn't be anything in memory causing this change.
  • summary() itself does not change: the model-fitting functions actually produce slightly different output (as confirmed with all.equal() ), which is even more disturbing. Saved objects produced when running the scripts in a different order reliably produce the same output whenever they are loaded, but that output differs depending on the order of scripts used to generate the fitted model objects (even though memory is cleared in between each script).
    • Depending on the order of scripts, summary( lm(...) ) also outputs different estimates for model terms, but the same Residuals summary, R^2, and overall F-test. Very bizarre.
  • I can not recover the default (desired) behaviour by removing packages loaded in prior scripts. Does the order of loading packages matter?
  • default behaviour is restored after quitting and re-starting R

Ideally, I would like my project to be able to reproduce all results and output by simply source()ing each script in turn, but this strange 'bug' (in my code - I'm not blaming this on R) means that the output is not consistent and depends on the order :(

Is there anything other than objects or packages that stays in memory that could alter the way model-fitting functions work, or store factor levels in data-frames that are passed in?

EDIT

I realized the answer to the above question was the contrasts option (see below). New question:

How can you reset options() to the default settings, i.e. to the values used when R starts up? The 'factory default' is options(contrasts=c("contr.treatment","contr.poly"))) but I'm wondering if there is a way to restart to the internal defaults (in case they aren't 'factory fresh'.

Community
  • 1
  • 1

1 Answers1

1

After comparing outputs, I realized I was looking at different contrasts, and remembered that the 'offending' script changed the contrasts options from the default:

options(contrasts=c("contr.sum","contr.poly"))

So, that explains all the confusion above. Hope that saves somebody else some hair-pulling. New question:

How can you reset options() to the default settings, i.e. to the values used when R starts up?

  • While answering your own question is appropriate, you should probably add your new question by editing your question above, rather than putting it here in the answer. You probably want `options(contrasts=c("contr.treatment","contr.poly")))` – Ben Bolker Nov 05 '12 at 01:52
  • Thanks Ben, The factory defaults you mention is what I'm using for now, but I was just curious if there was a way to re-load the internal default options, in case they were changed (in my Rprofile, or something). New question added to the OP, as suggested. – Jonathan Whiteley Nov 05 '12 at 02:36