0

I have a data.table with 2.7 million observations and 21 variables and need to produce a new data.table in which Variable 1 matches a vector of values.

I have vector of values that match with some of Variable1 values, like the following

VectorValue=  (A, B, XXZ, UDD...)

I was thinking of something like:

Table_B <- Table_A[Table_A$Variable1 == VectorValue]

or

Table_B <- Table_A[Variable1 == VectorValue]

but I get this error:

When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

Pdubbs
  • 1,967
  • 2
  • 11
  • 20
  • You might want to work on making a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 Your example is too barebones for me to understand it, anyways. – Frank Mar 28 '18 at 16:24
  • can you use `dput(head(Table_A))` to give us a look at data structure? – Pdubbs Mar 28 '18 at 16:32

1 Answers1

0

Two options I see here:

  1. VectorValue is the same length as Table_A$Variable1 and you want to compare values in the same position and return a subset of the data frame where those values are equal. In this case, you may just need to add a comma like so...

Table_B <- Table_A[Table_A$Variable1 == VectorValue, ]

...to indicate that you want all rows where the condition is TRUE and all of the columns in Table_A.

  1. You want to return any row of Table_A where Table_A$Variable1 has a value that matches any value within VectorValue. In this case, you'd want to use the %in% operator instead of ==, like so...

Table_B <- Table_A[Table_A$Variable1 %in% VectorValue, ]

Keith Mertan
  • 136
  • 4