One problem I often have with data.table
and ggplot
is their use in a for loop, in which I iterate over a set of column names.
Take this data table as example:
dt <- data.table(values1=rep(c(1,2),each=2),
values2=rep(c(10,20),each=2),
notthis=0,
category=rep(c('a','b'),each=2))
##
## values1 values2 notthis category
## 1: 1 10 0 a
## 2: 1 10 0 a
## 3: 2 20 0 b
## 4: 2 20 0 b
Let's say I want to iterate over all columns of dt
except notthis
and category
. For each column I want to plot two histogram of its values, according to category
, and add a vertical line representing their mean values (possibly passing the plots to a pdf device using pdf
, print
, dev.off
).
An idea of code could be as follows:
loopnames <- setdiff(colnames(dt), c('notthis', 'category'))
## [1] "values1" "values2"
for(ZZZ in loopnames){
dtmeans <- dt[, .(means=mean(ZZZ)), by=category]
ggplot(dt) + geom_histogram(aes(x=ZZZ, fill=category)) +
geom_vline(data=dtmeans, aes(xintercept=means, color=category))
}
but obviously it doesn't work. Use of the ZZZ
variable produces errors in data.table
and ggplot
.
Note the reasons behind some lines of the code:
- I want to build a list of the columns to iterate through, defined by difference:
dt
could have hundreds of columns and I only want to exclude, say, two of them. - I need to construct a data table, containing the means, to pass to
geom_vline
(a data table for this is overkill in my opinion, but hey that's whatggplot
wants). - I'd like to use the special syntax of
data.table
to construct such data table.
Consulting the useful answers to this post, this post, and this post, I've tried various combinations to make the code-idea above work: using with=FALSE
for the data table, the quote()
/eval()
pair, the "unquote" !!
character, as well as as.names()
and sym()
. But no combination worked out. Closest to solving the problem was the quote()
/eval()
pair, which seems to work for both data.table
and ggplot
, but I didn't manage to use this workaround in a for-loop.
Can you suggest a general way without using tidyverse commands to deal with variable/looped column names in packages such as data.table
and ggplot
?