For example, in Pandas, you always need to refer to a column in DataFrame by its name in a string:
df = pd.DataFrame(list(range(1,10)),columns = ["a"])
df["a"]
But in R, including some of its packages, such as data.table and dplyr, you are allowed to refer to a column without quotes, like in this way:
dt <- data.table(a = 1:10)
dt[,.(a)]
In my opinion, referring to column name unquoted is a disaster. The only benefit you get is that you don't need to type ""
. But the downsides are unlimited:
1) Very often you will need to select columns programmatically. With column name unquoted, you need to differentiate the variables in "outer" and "inner" context.
col_name <- "a"
dt[,..col_name]
2) Even if you manage to select the columns specified in a vector of strings, it's very hard to do (complex) operations on them. As mentioned in this question, you need to do in this way:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) :=
get(diststr)*sum(get(diststr))]
All in all, the feeling I have is that wrangling data in R is not straightforward/natural at all compared to the way done in pandas. Could someone please explain are there any upsides of this?