how to omit reciprocal of words in a data.frame using r

Question

I have been searching for an answer online but cannot seem to to get anywhere close..

I have a set of tickers and used expand.grid() to find combinations of them:

# TICKERS
A <- c("AIR", "AFAP", "AAL", "CECE", "ASA", "AVX")
# FIND COMBINATIONS
B <- expand.grid(A,A,stringsAsFactors=FALSE)

So now I want to omit the reciprocals, for example:

row 2 and row 7 are reciprocals, and I just want to keep one of those combinations not both.

head(B,10)
   Var1 Var2
1   AIR  AIR
2  AFAP  AIR
3   AAL  AIR
4  CECE  AIR
5   ASA  AIR
6   AVX  AIR
7   AIR AFAP
8  AFAP AFAP
9   AAL AFAP
10 CECE AFAP

akrun · Accepted Answer · 2016-01-27T07:29:10.887

Using the initial output from the OP, we can sort the 'B' by row using apply with MARGIN=1, then get the non-duplicated logical index of 'd1' rows with duplicated, and use that to subset 'B'

d1 <- as.data.frame(t(apply(B, 1, sort)))
B1 <- B[!duplicated(d1),]
head(B1, 10)
#   Var1 Var2
#1   AIR  AIR
#2  AFAP  AIR
#3   AAL  AIR
#4  CECE  AIR
#5   ASA  AIR
#6   AVX  AIR
#8  AFAP AFAP
#9   AAL AFAP
#10 CECE AFAP
#11  ASA AFAP

Another compact option would be using data.table

library(data.table)
CJ(A, A)[V1>=V2]

score 3 · Answer 2 · answered Jan 27 '16 at 07:21

Use package gtools instead:

library(gtools)
A <- c("AIR", "AFAP", "AAL", "CECE", "ASA", "AVX")

combinations(length(A), 2, A, repeats = FALSE)

#       [,1]   [,2]  
#  [1,] "AAL"  "AFAP"
#  [2,] "AAL"  "AIR" 
#  [3,] "AAL"  "ASA" 
#  [4,] "AAL"  "AVX" 
#  [5,] "AAL"  "CECE"
#  [6,] "AFAP" "AIR" 
#  [7,] "AFAP" "ASA" 
#  [8,] "AFAP" "AVX" 
#  [9,] "AFAP" "CECE"
# [10,] "AIR"  "ASA" 
# [11,] "AIR"  "AVX" 
# [12,] "AIR"  "CECE"
# [13,] "ASA"  "AVX" 
# [14,] "ASA"  "CECE"
# [15,] "AVX"  "CECE"

how to omit reciprocal of words in a data.frame using r

2 Answers2