Why does R allow to refer to a column in a data frame unquoted?

Question

For example, in Pandas, you always need to refer to a column in DataFrame by its name in a string:

df = pd.DataFrame(list(range(1,10)),columns = ["a"])
df["a"]

But in R, including some of its packages, such as data.table and dplyr, you are allowed to refer to a column without quotes, like in this way:

dt <- data.table(a = 1:10)
dt[,.(a)]

In my opinion, referring to column name unquoted is a disaster. The only benefit you get is that you don't need to type "". But the downsides are unlimited:

1) Very often you will need to select columns programmatically. With column name unquoted, you need to differentiate the variables in "outer" and "inner" context.

col_name <- "a"
dt[,..col_name]

2) Even if you manage to select the columns specified in a vector of strings, it's very hard to do (complex) operations on them. As mentioned in this question, you need to do in this way:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := 
get(diststr)*sum(get(diststr))]

All in all, the feeling I have is that wrangling data in R is not straightforward/natural at all compared to the way done in pandas. Could someone please explain are there any upsides of this?

but from base R also see `with()`, `within()`, `subset()`, model formulas, ... pros and cons of *non-standard evaluation* are a huge can of worms, but I voted to close as opinion-based ... — Ben Bolker, Nov 29 '18 at 20:50
This is opinion based, but "the downsides are unlimited" is false. I'm not as familiar with `data.table`, but `dplyr` evaluation is completely unambiguous. And yes, it is to save typing. Typing `""` requires 2 to 4 key strokes (2 if you use Rstudio), and when you have to type so many variables, it becomes a lot. — thc, Nov 29 '18 at 20:55
Also in Rstudio, unquoted variables allows for tab autocompletion of variable names, which may not have been possible with strings. — thc, Nov 29 '18 at 20:56

score 2 · Accepted Answer · answered Nov 29 '18 at 21:17

in Pandas you can refer to suitably named columns without quotes, e.g:

df = pd.DataFrame(dict(
  a=[1,2,3],
  b=[5,6,7],
))
print(df.a)

is valid, concise and similar syntax works in R.

the choice depends on how much the code's author knows about the dataset and what is convenient at the time — for quick analyses this is great, for more repeatable workflows this can be awkward.

I also tend to use unquoted variable accessors a lot when working with databases — column names basically always valid identifiers

df = pd.read_sql('select a, b from foo', dbcon)
df.a

or

df <- dbGetQuery(dbcon, 'select a, b from foo')
df$a

for Pandas and R respectively…

each language/library provides the tools, it's up to you to use them appropriately!

Why does R allow to refer to a column in a data frame unquoted?

1 Answers1