Data
We have numerous text strings that look like this (way longer in our real dataset):
df <- data.frame(
id = c('text1','text2','text3'),text = c('ABA','ABA','AAA')
)
>df
id text
1 text1 ABA
2 text2 ABA
3 text3 AAA
We want to create a matrix that tells how often a letter at position x is found together with the other letters at other positions, so in this case:
3A 3 1 2 3
2B 2 0 2 2
2A 1 1 0 1
1A 3 1 2 3
1A 2A 2B 3A
What I tried
I previously converted the matrix to a binary matrix, looking like this:
structure(list(pos1_A = c(1, 1, 1), pos2_A = c(0, 0, 1), pos2_B = c(1,
1, 0), pos3_A = c(1, 1, 1)), class = "data.frame", row.names = c("text1",
"text2", "text3"))
pos1_A pos2_A pos2_B pos3_A
text1 1 0 1 1
text2 1 0 1 1
text3 1 1 0 1
Then I can run commands like cor
to get correlations, however, instead of correlations I want the frequencies.
Note this is different from questions about co-occurrences wherein the variable name itself (here position) is neglected, for example like "How to use R to create a word co-occurrence matrix"