121

Which conventions for naming variables and functions do you favor in R code?

As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:

1. Use of period separator, e.g.

  stock.prices <- c(12.01, 10.12)
  col.names    <- c('symbol','price')

Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.

Cons: Rife with object-oriented connotations, and confusing to R newbies

2. Use of underscores

  stock_prices <- c(12.01, 10.12)
  col_names    <- c('symbol','price')

Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.

Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').

3. Use of mixed capitalization (camelCase)

  stockPrices <- c(12.01, 10.12)
  colNames    <- c('symbol','price')

Pros: Appears to have wide adoption in several language communities.

Cons: Has recent precedent, but not historically used (in either R base or its documentation).

Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.

The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
medriscoll
  • 26,995
  • 17
  • 40
  • 36
  • 1
    There are also instances of MATLAB style `alllowercase` variable names, and plenty of straight-from-the-equation very short names (`x`, `y`, etc.). – Richie Cotton Dec 23 '09 at 04:03
  • 5
    underscores are like python, so i tend to use underscores. ESS should be fixed, that's really silly. – Brendan OConnor Dec 24 '09 at 01:51
  • 8
    There is nothing to fix, it has a toggle for that. But the _default behaviour_ is to interpret an underscore as a shortcut for <- saving you a key to press. So if you publish variables with underscores (Hi, Hadley) you force every ESS user to press _ twice to get the original bahaviour -- or to have customised their ESS setup. I still prefer camelCase by a new nautical miles. – Dirk Eddelbuettel Dec 25 '09 at 14:39
  • Regarding ESS and Emacs, you can disable that annoying behavior by putting `ess-toggle-underscore nil)` in your .emacs file. Hope this helps. – eold Nov 16 '11 at 21:05
  • 2
    camelCase has problems too, e.g. the standard camel Case ``ImfDataTransformed`` or the natural extended version ``IMFDataTransformed`` are not as easy to read as my preferred TOGGLEcamelCase: ``IMFdataTransformed``... – PatrickT Jan 04 '15 at 18:28
  • 1
    I'm voting to close this question as off-topic because the answers are bound to be opinion-based. – Ben Bolker Jul 19 '16 at 18:51
  • As a specific case; since underscore (_) character cannot be used in the first letter, the only possible way to indicate a variable as internal is to place a dot (.) before the name (like `_name`, or `__name` in C/C++). For example `.name`. Note that numbers could not be the second letter after a dot (.) according to [Make Syntactically Valid Names](https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html). Refer to http://stackoverflow.com/a/38448219/2101864 for internal variables. – Gürol Canbek Feb 27 '17 at 07:31

9 Answers9

88

Good previous answers so just a little to add here:

  • underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);

  • dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;

  • so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.

And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 7
    +1 Well said! [If only the core team would put out a definitive style guide; I feel like that would give more credence to their already implied usage.] – Shane Dec 22 '09 at 15:11
  • 1
    I could just be misremembering based on my own bias towards mixed case but I believe that's what RG always used when I was working for him. I figure what's good for RG is good for me! – geoffjentry Dec 23 '09 at 00:32
  • Geoff: Not a bad rule to go by :) – Dirk Eddelbuettel Dec 23 '09 at 01:00
  • Dirk - I'm giving your answer the thumbs up here, but it would be truly wonderful if this style preference were reified in a document somewhere at r-project.org. At present, it's floating in the un-Google-able collective consciousness of the R Core Team :). – medriscoll Dec 27 '09 at 04:14
  • 2
    Thanks for thumbs-up. As for for the 'canonical style document': wishing along doesn't make it so, or I'd be riding pink ponies. Maybe you can start by authoring something, which you could stick onto the R Wiki and we all edit, adopt and adhere to it. Hope springs eternal, as they say... – Dirk Eddelbuettel Dec 27 '09 at 04:28
  • I have no problems with camelCase though I prefer underscores and don't use ESS. I will say that it would be nice to have multiple naming conventions for different situations as the google guide aims for with camelcase for functions. It dramatically increases comprehension. Since underscores are used in a number of languages it would be ideal to have them for one thing, be it variables, functions et al – Dan Jan 07 '10 at 04:19
  • 1
    @Dirk - I plan to start heading toward camel casing based on your recommendation, but I am curious if you know why `?make.names` appears to suggest that dot separated names are preferred? – David LeBauer Mar 09 '11 at 18:00
  • Sorry, David, but I wrote *evil* above in my answer. As I wrote, I much prefer camelCase over dot.separated names. – Dirk Eddelbuettel Mar 09 '11 at 20:01
  • Putting `(ess-toggle-underscore nil)` in your .emacs solves the problem. – eold Nov 16 '11 at 21:05
  • I do not use ESS but I write a lot of R functions with underscores. Why are underscores inconvenient for ESS users? Are there workarounds? – stevec May 12 '19 at 13:08
84

I did a survey of what naming conventions that are actually used on CRAN that got accepted to the R Journal :) Here is a graph summarizing the results:

enter image description here

Turns out (no surprises perhaps) that lowerCamelCase was most often used for function names and period.separated names most often used for parameters. To use UpperCamelCase, as advocated by Google's R style guide is really rare however, and it is a bit strange that they advocate using that naming convention.

The full paper is here:

http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Rasmus Bååth
  • 4,827
  • 4
  • 29
  • 29
38

Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T) to see them all.

I use the official Hadley style of coding ;)

rhombidodecahedron
  • 7,693
  • 11
  • 58
  • 91
hadley
  • 102,019
  • 32
  • 183
  • 245
  • 1
    That's neat! I wasn't aware of the *apropos* function before. This returns 10 functions for me in R 2.9.0; I'd hardly say that's a compelling case. What's your rationale for underscores when they're clearly in a minority for R? – Shane Dec 22 '09 at 17:13
  • 3
    Well it's 16 in R 2.10.0, so that's a 60% increase per version ;) I mainly like them because they remind me of Ruby; camelCase reminds me of Java. – hadley Dec 22 '09 at 17:57
  • 6
    Hadley, my heart says to support your underscore insurgency, but my head says to respect the community standard, and say yes to camelCase. :( But perhaps self-consistency is all that matters. – medriscoll Dec 23 '09 at 23:49
5

I like camelCase when the camel actually provides something meaningful -- like the datatype.

dfProfitLoss, where df = dataframe

or

vdfMergedFiles(), where the function takes in a vector and spits out a dataframe

While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.

Robert
  • 838
  • 6
  • 8
3

This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.

Shane
  • 98,550
  • 35
  • 224
  • 217
3

As I point out here:

How does the verbosity of identifiers affect the performance of a programmer?

it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...

For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.

Community
  • 1
  • 1
David Lawrence Miller
  • 1,801
  • 11
  • 12
2

As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.

Using dots as a separator gets a little hairy with S3 classes and the like.

In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.

geoffjentry
  • 4,674
  • 3
  • 31
  • 37
1

I have a preference for mixedCapitals.

But I often use periods to indicate what the variable type is:

mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.

and so on.

Jesse
  • 849
  • 5
  • 2
1

Usually I rename my variables using a ix of underscores and a mixed capitalization (camelCase). Simple variables are naming using underscores, example:

PSOE_votes -> number of votes for the PSOE (political group of Spain).

PSOE_states -> Categorical, indicates the state where PSOE wins {Aragon, Andalucia, ...)

PSOE_political_force -> Categorial, indicates the position between political groups of PSOE {first, second, third)

PSOE_07 -> Union of PSOE_votes + PSOE_states + PSOE_political_force at 2007 (header -> votes, states, position)

If my variable is a result of to applied function in one/two Variables I using a mixed capitalization.

Example:

positionXstates <- xtabs(~states+position, PSOE_07)

BenMorel
  • 34,448
  • 50
  • 182
  • 322
calejero
  • 424
  • 1
  • 5
  • 15