0

I am so newbie and thank you so much in advance for advice

I want to make co-occurrence matrix, and followed link below

How to use R to create a word co-occurrence matrix

but I cannot understand why value of A-A is 10 in the matirx below It should be 4 isn't it? because there are four A

dat <- read.table(text='film tag1 tag2 tag3

  • 1 A A A

  • 2 A C F

  • 3 B D C ', header=T)

crossprod(as.matrix(mtabulate(as.data.frame(t(dat[, -1])))))

( ) A C F B D

A 10 1 1 0 0

C 1 2 1 1 1

F 1 1 1 0 0

B 0 1 0 1 1

D 0 1 0 1 1

Charley
  • 1
  • 1

1 Answers1

1

The solution you use presumes each tag appears only once per film, which jives with the definition of a co-occurrence matrix as far as I can tell. Therefore, each A on the first line gets counted as co-occurring with itself and with the other two As, resulting in a total of ten co-occurences when factoring in the A on the second line.

Haem
  • 929
  • 6
  • 15
  • 31