I having trouble figuring out how to perform multiple operations in data.table
using some patterns matching to determine which columns are used. For example:
library(data.table)
library(dplyr)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
iris <- copy(iris)
iris_dplyr_above_6 <- iris %>%
select(contains("Length"), Species) %>%
gather(col, val, -Species) %>%
filter(val > 6)
unique(iris_dplyr_above_6$Species)
#> [1] versicolor virginica
#> Levels: setosa versicolor virginica
setDT(iris)
iris_dt_above_6 <- iris[Sepal.Length > 6 | Petal.Length > 6,]
unique(iris_dt_above_6$Species)
#> [1] versicolor virginica
#> Levels: setosa versicolor virginica
Created on 2019-07-19 by the reprex package (v0.3.0)
In this example I can select columns with dplyr
based on the "Length" string. In data.table
I have to manually enter each column. Obviously this example is trivial as typing out two column names is hardly onerous. However, in situations where you have many many columns, having some programmatic way to select your columns is useful. I am assuming that data.table
has a nifty way of doing this and I just haven't been able to find it yet. Or maybe I am misunderstanding the problem and really it is a base R solution.
Any advice?