Background:
Find the number of times companies move in tandem regarding their Market Capitalization, so for example with company A and company B, they move 3 times together and I would like to divide it when both company A and B show values different than NA (here, 10). I used a logical formula, TRUE when they have the same letter, FALSE when they do not have the same letter and NA when there is one NA value in A or B.
THE PROBLEM IS:
The code I used works with small sets, max 50 companies, then it takes too much time, and I am looking to do it for sets of 100 companies, approxi. 324.000.000 data points
Input (small subset): Dataframe "dat"
CompA CompB CompC CompD
1 A F <NA> A
2 A F <NA> F
3 F E <NA> A
4 A A <NA> A
5 F A <NA> F
6 A D <NA> D
7 F F <NA> B
8 A A <NA> F
9 F E <NA> F
10 <NA> C <NA> A
11 E F <NA> E
Code used:
v <- NULL
i <- 1
j <- 1
for(i in 1:length(dat)-1){
j <- i+1
while(j <= length(dat)-1){
str(dat)
qone <- data.frame(qone =
(as.character(dat[,i+1])==as.character(dat[,j+1])))
count1 <- length(which(qone == TRUE))/(length(which(qone ==
TRUE))+length(which(qone == FALSE)))
v <- append(v, count1)
v <- data.frame(v)
j <- j+1
}}
Final output:
x1 x2 x3 x4 x5 x6
1 0.3 NA 0.5 NA 0.27 NA
Second Final output: 1 Nb TRUE 2 Nb FALSE
x1 x2 x3 x4 x5 x6
1 3 0 5 0 3 0
2 7 0 6 0 8 0