7

I recently noticed in some old code that I had been including extra square brackets when subsetting a data.table and performing a function repeatedly (in my case, calculating correlation matrices). So,

# Slow way
rcorr(DT[subgroup][, !'Group', with=F])

# Faster way
rcorr(DT[subgroup, !'Group', with=F])

(The difference being after subgroup). Just out of curiosity, why does this occur? With the extra brackets, does data.table have to perform some extra computations?

Chris Watson
  • 1,347
  • 1
  • 9
  • 24

1 Answers1

6

Here's a simple interpretation:

# Slow way
rcorr(DT[subgroup][, !'Group'])

The second set of brackets is a second operation on DT, meaning that DT[subgroup] creates a new data table from DT, and then [, !'Group'] operates on that data table, creating another new data table. Hence the decline in speed.

# Faster way
rcorr(DT[subgroup, !'Group'])

This way operates only on DT, all in one go.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245