I am sorry in advance for the long post which mixes several questions. If not appropriate, please edit or advise what I should do. I am practicing data.table join, here is an imaginary scenario:
"Two robots, each has 4 positions in MovementA and 4 positions in MovementB. Problem to Solve: for each robot, from MoveA to MoveB, there are 4x4 possible Position-pairs Find the 4 pairs with the shortest distance"
Data Setup
library(data.table)
set.seed(20141220)
dtMoveA = data.table(RobotID = rep(1:2, each=4), Position=sample(1:20, 8))
dtMoveB = data.table(RobotID = rep(1:2, each=4), Position=sample(1:20, 8))
# Review Data
rbind(cbind(Movement="Move-A", dtMoveA), cbind(Movement="Move-B", dtMoveB))
Movement RobotID Position
1: Move-A 1 18
2: Move-A 1 20
3: Move-A 1 15
4: Move-A 1 8
5: Move-A 2 13
6: Move-A 2 2
7: Move-A 2 9
8: Move-A 2 12
9: Move-B 1 18
10: Move-B 1 14
11: Move-B 1 13
12: Move-B 1 17
13: Move-B 2 5
14: Move-B 2 16
15: Move-B 2 20
16: Move-B 2 3
Solution 1 (using dplyr)
library(dplyr)
dtMoveA %>%
inner_join(dtMoveB, by="RobotID") %>%
mutate(AbsDistance = abs(Position.x - Position.y)) %>%
group_by(RobotID, Position.x) %>%
filter(AbsDistance == min(AbsDistance)) %>%
arrange(RobotID, Position.x)
RobotID Position.x Position.y AbsDistance
1 1 8 13 5
2 1 15 14 1
3 1 18 18 0
4 1 20 18 2
5 2 2 3 1
6 2 9 5 4
7 2 12 16 4
8 2 13 16 3
(attempt) Solution 2 (using data.table)
setkey(dtMoveA, RobotID)
setkey(dtMoveB, RobotID)
dtMoveA[dtMoveB, .(RobotID, Position, i.Position,
AbsDistance = abs(Position - i.Position)), allow.cartesian=TRUE
] [, MinDistance := min(AbsDistance), by=list(RobotID, Position)
] [ AbsDistance == MinDistance, .(Position, i.Position, AbsDistance), by=RobotID
] [ order(RobotID, Position)]
RobotID Position i.Position AbsDistance
1: 1 8 13 5
2: 1 15 14 1
3: 1 18 18 0
4: 1 20 18 2
5: 2 2 3 1
6: 2 9 5 4
7: 2 12 16 4
8: 2 13 16 3
Question 1 Can you please correct my Solution2 with the good practices from data.table art?
Question 2 without the parameter allow.cartesian=TRUE
data.table warns
"Join results in 32 rows; more than 8 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j
and dropping by
(by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE"
Is it really a cartesian product? Here the join is made only on the common key values, it is just a coincidence in the data which yields a big join results.
Question 3 dtMoveA and dtMoveB have same column names. datatable join makes the distinction by changing the name to i.Position
. Is the "i" prefix something hardcoded? And I suppose i.ColumnName always applies to Y member in the X[Y] join expression.
Thanks in advance for any help.