@gented's answer here demonstrates how to randomly select a subset of rows from a data.table
.
What if I wanted to select all rows in a data.table
for which the values in a certain column meet a specific condition, AND ADDITIONALLY select a random subset of rows from the data.table
for which the values in the same column meet a different condition?
Say, for example, that I wanted a random sample of 5 rows from the mtcars
data.table
for which cyl == 6
, and all rows for which cyl == 8
.
Is this achievable in a better way than:
rbind(
mtcars[ cyl == 8 ],
mtcars[ cyl == 6 ][ sample(.N, 5) ]
)
That is, can I subset the data.table
in a single set of []
's so that I could also, for example, apply a function within that call (in the lapply(.SD, function)
format)?
This obviously does not achieve the desired result, but is similar to the syntax I'm looking for:
mtcars[
cyl == 8 | ( cyl == 6 & sample( .N, 5 ) ),
lapply(.SD, generic_funciton),
.SDcols = (specific_cols)
]