0

I have a csv file which contains data as the format below

A   a
A   b
B   f
B   g
B   e
B   h
C   d
C   e
C   f

The first column contains items second column contains available feature from feature vector=[a,b,c,d,e,f,g,h] I want to convert this to occurrence matrix look like below

    a,b,c,d,e,f,g,h
A   1,1,0,0,0,0,0,0
B   0,0,0,0,1,1,1,1
C   0,0,0,1,1,1,0,0

I know how to do it using pandas from Convert Two column data frame to occurrence matrix in pandas. Can anyone tell me how to do this using R.

Community
  • 1
  • 1
Lok
  • 129
  • 1
  • 2
  • 13

2 Answers2

1

Read your csv file into a data frame, say dat. Now do

o <- table(dat)

Note that this is getting you a contingency table. If you get multiple occurrence, i.e., values great than 1 in the matrix, you want an extra post-processing:

o <- (o > 0) + 0

With your example data, there is no "c" in the second column. To display it in the resulting matrix, control factor levels:

dat[[2]] <- factor(dat[[2]], levels = letters[1:8])

Then do above.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • What does that `(o > 0) + 0` post-processing do? When I try to convert a table to a data frame, without that, I get something very different from what the table looks like and I don't understand why. – vuzun Dec 01 '20 at 13:35
  • 1
    @vuzun A type conversion. `o > 0` is TRUE/FALSE logical, but becomes 1/0 numeric after `+ 0`. – Zheyuan Li Dec 01 '20 at 13:37
0

Hi i think you are looking for a co-occurrence matrix right?

my_mat <- as.data.frame(matrix(c("A","A","B","B","B","B","C","C","C","a","b","f","g","e","h","d","e","f"),ncol=2))

coc_mat <- matrix(nrow = length(levels(my_mat$V1)),ncol = length(levels(my_mat$V2)))
colnames(coc_mat) <-  levels(my_mat$V2)
rownames(coc_mat) <- levels(my_mat$V1)

for(i in 1:length(levels(my_mat$V2))){
  for(j in 1:length(levels(my_mat$V1))){
    coc_mat[j,i] <- length(which(my_mat$V1==levels(my_mat$V1)[j] & my_mat$V2==levels(my_mat$V2)[i]))

  }
}

coc_mat

This creates an co-occ matrix looking like this

  a b d e f g h
A 1 1 0 0 0 0 0
B 0 0 0 1 1 1 1
C 0 0 1 1 1 0 0

this tells you how many times for example A and a occur in the same row. I hope this was helpful. Have a nice day

Also:

Check this out:

Creating co-occurrence matrix

Community
  • 1
  • 1
Joyvalley
  • 154
  • 1
  • 7