11

I am a statistics graduate student who works a lot with R. I am familiar with OOP in other programming contexts. I even see its use in various statistical packages that define new classes for storing data.

At this stage in my graduate career, I am usually coding some algorithm for some class assignment--something that takes in raw data and gives some kind of output. I would like to make it easier to reuse code, and establish good coding habits, especially before I move on to more involved research. Please offer some advice on how to "think OOP" when doing statistical programming in R.

osazuwa
  • 111
  • 2
  • 1
    FWIF, it might be more useful to thing functionally rather than in an OO way for these kinds of applications. – trutheality Jun 02 '11 at 04:23

6 Answers6

7

I would argue that you shouldn't. Try to think about R in terms of a workflow. There's some useful workflow suggestions on this page:

Workflow for statistical analysis and report writing

Another important consideration is line-by-line analysis vs. reproducible research. There's a good discussion here:

writing functions vs. line-by-line interpretation in an R workflow

Community
  • 1
  • 1
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • Why and how would you argue that one shouldn't? – naught101 Nov 21 '12 at 09:09
  • Many (some would argue "most") R-users are not programmers first, but have a domain specialization and use R as a facilitator. While the OOP paradigm is built into R (methods and classes), you do not need to learn them to have functioning, reproducible, workflow oriented code that you can reuse as was described as the goal of the OP. Unless you're involved in package creation, getting to this level of understanding of R is unnecessary. The overall workflow is more important. Rigorous OOP would require an understanding S4 classes, which is largely a waste of effort for the typical user. – Brandon Bertelsen Nov 22 '12 at 02:58
  • you should replace the second sentence in your answer with that comment :) – naught101 Nov 22 '12 at 03:45
5

Two aspects of OOP are data and the generics / methods that operate on data.

The data (especially the data that is the output of an analysis) often consists of structured and inter-related data frames or other objects, and one wishes to manage these in a coordinated fashion. Hence the OOP concept of classes, as a way to organize complex data.

Generics and the methods that implement them represent the common operations performed on data. Their utility comes when a collection of generics operate consistently across conceptually related classes. Perhaps a reasonable example is the output of lm / glm as classes, and the implementation of summary, anova, predict, residuals, etc. as generics and methods.

Many analyses follow familiar work flows; here one is a user of classes and methods, and gets the benefit of coordinated data + familiar generics. Thinking 'OOP' might lead you to explore the methods on the object, methods(class="lm") rather than its structure, and might help you to structure your work flows so they follow the well-defined channels of established classes and methods.

Implementing a novel statistical methodology, one might think about how to organize the results in to a coherent, inter-related data structure represented as a new class, and to write methods for the class that correspond to established methods on similar classes. Here one gets to represent the data internally in a way that is convenient for subsequent calculation rather than as a user might want to 'see' it (separating representation from interface). And it is easy for the user of your class (as Chambers says, frequently yourself) to use the new class in existing work flows.

It's a useful question to ask 'why OOP' before 'how OOP'.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
4

You may want to check these links out: first one, second one.

And if you want to see some serious OO code in R, read manual page for ReferenceClasses (so called R5 object orientation), and take a look at Rook package, since it relies heavily on ReferenceClasses. BTW, Rook is a good example of reasonable usage of R5 in R coding. Previous experience with JAVA or C++ could be helpful, since R5 method dispatching differs from S3. Actually, S3 OO is very primitive, since the actuall "class" is saved as an object attribute, so you can change it quite easily.

S3: <method>.<class>(<object>)
R5: <object>$<method>

Anyway, if you can grab a copy, I recommend: "R in a Nutshell", chapter 10.

aL3xa
  • 35,415
  • 18
  • 79
  • 112
3

I have a limited knowledge of how to use R effectively, but there here is an article that allowed even me to walk through using R in an OO manner:

http://www.ibm.com/developerworks/linux/library/l-r3/index.html

IAmTimCorey
  • 16,412
  • 5
  • 39
  • 75
1

I take exception to David Mertz's "The methods package is still somewhat tentative from what I can tell, but some moderately tweaked version of it seems certain to continue in later R versions" mentioned in the link in BiggsTRC answer. In my opinion, programming with classes and methods and using the methods package (S4) is the proper way to "think OOP" in R.

The last paragraph of chapter 9.2 "Programming with New Classes" (page 335) of John M. Chambers' "Software for Data Analysis" (2008) states: "The amount of programming involved in using a new class may be much more than that involved in defining the class. You owe it to the users of your new classes to make that programming as effective as possible (even if you expect to be your own main user). So the fact that the programming style in this chapter and in Chapter 10 ["Methods and Generic Functions"] is somewhat different is not a coincidence. We're doing some more serious programming here."

Consider studying the methods package (S4).

  • The Mertz article was written in 2006, and at the time the characterization of S4 methodology as "somewhat tentative" seems fairly accurate. "Moderate tweaking" is a reasonably good description of what has occurred since then. – IRTFM Jun 02 '11 at 12:06
  • The question asked "Please offer some advice on how to "think OOP" when doing statistical programming in R." I think it would be a dis-service to the questioner if S4 was not mentioned, however "tweaked" it's origin may have been. – Andre Michaud Jun 02 '11 at 12:31
1

Beyond some of the other good answers here (e.g. the R in a Nutshell chapter, etc), you should take a look at the core Bioconductor packages. BioC has always had a focus on strong OOP design using S4 classes.

geoffjentry
  • 4,674
  • 3
  • 31
  • 37