35

What are some good practices for programming in R?

Since R is a special-purpose language that I don't use all the time, I typically just hack together some quick scripts that do what I need.

But what are some tips for writing clean and efficient R code?

Frank
  • 64,140
  • 93
  • 237
  • 324
  • A related question of interest: http://stackoverflow.com/questions/1295955/what-is-the-most-useful-r-trick (SO should really do a better job of finding those, it shows jQuery and Windows 7 questions as "related"). – Frank Feb 14 '10 at 14:12

5 Answers5

22

You already provide some hints by stating your approach is 'hack quick scripts'. If you want best practices and structure, simple follow the established best practices from CRAN:

  • create a package, this opens the door to running R CMD check which is very useful
  • as many people have stated, having a package helps you in the code writing stage too as you are somewhat forced to document the code; that is a Good Thing (TM)
  • once you have a package, add code in the \examples{} section of the documentation as this will be running during R CMD check and provides an easy entry to regression testing
  • once you get used to regression testing, start to use a package such as RUnit; that really is best practices
  • JD's pointer to the Google Style Guide is a good one too. That isn't the only style guide as e.g. Henrik's R Coding Convention precedes it by a few years; and there is also Hadley's riff on Google's style guide
  • Otherwise, the oldie-but-goldie 'do what your colleagues and coauthors do' also applies
Scarabee
  • 5,437
  • 5
  • 29
  • 55
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
15

I recommend Josh Reich's Load, Clean, Func, Do workflow from this previous question.

In addition I recommend following coding guidelines such as Google's R Style Guide. Using a coding style guide makes reading the code later so much easier.

Community
  • 1
  • 1
JD Long
  • 59,675
  • 58
  • 202
  • 294
  • 3
    i wish the 'dot' naming convention was not endorsed in that Style Guide (e.g., some.variable.name). It has history on its side and most R code is written that way, still though, not a fan. – doug Feb 13 '10 at 16:25
  • 1
    I agree doug. I use camelCase myself. Style guides, like version control, are less about which one you choose and more about picking one and using it. – JD Long Aug 05 '11 at 12:11
7

I completely agree with the existing answers, especially regarding the usage of packages. Packages require a lot of discipline, documentation, and structure, which really help to enforce best practices (along with R CMD CHECK). You can also use the codetools package to help with this. Use the roxygen package for documentation.

Beyond that, I recommend that you not only vectorize your code, but more particularly, make every effort to vectorize your functions, meaning that you should be able to provide vector arguments and get vectors returned (even from things like database calls). That will really improve your code efficiency and clarity in the long run.

Lastly, I really like to use something like Sweave to organize my code into clear literate reproducible research whenever writing a report. Along with this I recommend using the cache package.

Shane
  • 98,550
  • 35
  • 224
  • 217
3

For efficiency, prefer vector operations over for loops.

Frank
  • 64,140
  • 93
  • 237
  • 324
2

This is good programming practice in general, but use a version control system such as SVN manage your code.

stevejb
  • 2,414
  • 5
  • 26
  • 41