I have 2000 rows of data for 4000 columns. What I'm trying to do is to compare each row to the rest of the rows and see how similar they are in terms of different columns/total columns.
What I did so far is as follows:
for (i in 1:nrow(data))
{
for (j in (i+1):nrow(data))
{
mycount[[i,j]] = length(which(data[i,] != data[j,]))
}
}
There are 2 problems with it, j doesn't start from i+1 (which is probably a basic mistake)
The main problem however is time it consumes, it takes ages...
Could someone please suggest a more proper way to achieve the same result, result being the percentage of each rows similarity to the other rows?
Here's an example of data and what I want to achieve:
The output should be something like:
mycount[1,2] = 2 (S# and var3 columns are different)
mycount[1,3] = 2 (S# and var1 columns are different)
mycount[1,4] = 2 (S# and var4 columns are different)
mycount[2,3] = ...
mycount[2,4] = ...
mycount[3,4] = 3 (S#, var1 and var 4 are different)