-4

I need some help with R code.

I have a data frame, let's say it looks like this:


___|c1... c2... c3... c4... c5

r1_| 1...... 2...... 3..... 4..... 5

r2_| 1...... 3...... 5..... 4..... 5

r3_| 4...... 2...... 1..... 1..... 2

r4_| 1...... 2...... 3..... 4..... 5

r5_| 3...... 3...... 4..... 2..... 1


I need to do a 'similarity check'. How it is done is, I need to run a Loop which goes through every every element of each row and compares it with every other corresponding element of every other row. So this means, I want to make the loop to check each responses like this and give me a Boolean value T(rue) or F(alse):

[r1,c1] == [r1,c1]

[r1,c1] == [r2,c1]

[r1,c1] == [r3,c1]

[r1,c1] == [r4,c1]

[r1,c1] == [r5,c1]

The loop at this point finished checking [r1,c1] against all elements of c1(including it self, which is not necessary). After comparing [r1,c1] I want the loop to go to [r1,c2] and compare it with all the elements of c2. Like this I want all the elements of r1 to be compared to all their corresponding row elements. The console out put would look like this:

T.....T.....T.....T.....T

T.....F.....F.....T.....T

F.....T.....F.....F.....F

T.....T.....T.....T.....T

F.....F.....F.....F.....F

Now this is ONLY the comparison of [r1, ] with [r1, ], [r2, ], [r3, ], [r4, ] and [r5, ]. The loop after comparing [r1, ] should go to [r2, ] and compare it in the same manner with [r3, ], [r4, ] and [r5, ] and then [r3] with [r4, ] and [r5, ] so on and so forth.

In the end I would get a matrix with Trues and Falses which will show me the similarity of every survey with every other survey. I will then take for (every row the number of 'T's and divide by the number of columns) * 100. This will tell me how similar is a survey with another.

TIA :)

Also, is there no easier way to insert tables to explain the question better? This is my first question here, hope I didn't waste time typing all those "."

shbshk
  • 72
  • 5
  • There are better was to share data in questions. See [how to create a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – MrFlick Mar 02 '16 at 05:19

1 Answers1

0

The apply function over columns using a function that compares the first element of the column to the rest of the column gets it done...

df <- read.csv(textConnection(gsub("\\.+", ",", 
  "1...... 2...... 3..... 4..... 5
  1...... 3...... 5..... 4..... 5
  4...... 2...... 1..... 1..... 2
  1...... 2...... 3..... 4..... 5
  3...... 3...... 4..... 2..... 1")), header=FALSE)

apply(df, 2, function(x)x[1]==x)
        V1    V2    V3    V4    V5
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE
[2,]  TRUE FALSE FALSE  TRUE  TRUE
[3,] FALSE  TRUE FALSE FALSE FALSE
[4,]  TRUE  TRUE  TRUE  TRUE  TRUE
[5,] FALSE FALSE FALSE FALSE FALSE
cory
  • 6,529
  • 3
  • 21
  • 41