0

I've never posted on here before, but I figured I would give it a shot..

I've spent some time googling, and can't find exactly what I am looking for... I have a data frame like this:

df <- structure(list(response = c("Topic1", "Topic10", "Topic11", "Topic12", 
"Topic13", "Topic14", "Topic15", "Topic16", "Topic17", "Topic18", 
"Topic19", "Topic2", "Topic20", "Topic21", "Topic22", "Topic23", 
"Topic24", "Topic25", "Topic26", "Topic27", "Topic28", "Topic29", 
"Topic3", "Topic30", "Topic31", "Topic32", "Topic33", "Topic34", 
"Topic35", "Topic36", "Topic37", "Topic38", "Topic39", "Topic4", 
"Topic40", "Topic41", "Topic42", "Topic43", "Topic44", "Topic45", 
"Topic46", "Topic47", "Topic48", "Topic49", "Topic5", "Topic50", 
"Topic6", "Topic7", "Topic8", "Topic9"), judgement.yNTA = c(0, 
0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
judgement.yYTA = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42","43","44","45","46","47","48","49", "50"))

where I have coded 1=statistically significant value, 0=non-statistically significant with 50 topics. I want to update a blank 50x50 matrix with +1 when two values are both statistically significant... Code for blank matrix: mymatrix <- matrix( , nrow = 50, ncol = 50)

For example, Topic25 and Topic31 are both statistically significant for the NTA votes... so I want the matrix to reflect this by adding a 1 to [25, 31] and [31, 25] in my matrix. I also want to ensure that I am not replacing the value with "1", but adding 1 to the existing value, because I want to see how many times these topics show up together across different dataframes! I also want to make sure that this code would look at both columns when filling the matrix

I don't really know where to start with this, and would appreciate any tips on building a command that would work! Thanks in advance!

r8gan
  • 3
  • 2
  • What is going to happen with `topic40`? And what about `[25,25]`? – harre Jul 08 '22 at 16:57
  • 1
    Please post some data: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – harre Jul 08 '22 at 16:59
  • Do you already know that Topic25 and Topic31 are "statistically significant' or do you want to test where the NTA votes are significantly different for these two groups as part of your procedure? – Jonathan Vitale Jul 08 '22 at 17:14
  • 1
    @JonathanVitale I already have done the test to make sure these are significant! I am looking at many different data frames to see which topics are significant together for across different sets of stories...The matrix is just to help us see/organize which topics are statistically significant and used together in the same direction! Please let me know if you have other questions! Thanks for your help! – r8gan Jul 08 '22 at 17:24
  • @harre it looks like someone added the code to get the dataframe.... I should have been more specific, sorry about that... for 25,31, and 40, I would like a 1 added in [25, 31], [31, 25], [25, 40], [40, 25], [31, 40], and [40, 31]... [25, 25] will probably get a one, just based on what is possible with the code, but we will ignore these so it doesn't really matter... Please let me know if you have other questions i really appreciate you helping me out – r8gan Jul 08 '22 at 17:24
  • @r8gan I'm sorry, I'm completely lost here. Did you add the sample df that's up in the original comment or did someone else? The structure [25, 31], [31, 25]... doesn't seem to correspond to anything I'm seeing in that data. Do you want something like a correlation matrix? If the data is not correct, please fix it the way you want it. – Jonathan Vitale Jul 08 '22 at 18:05
  • I am adding it to a blank 50x50 matrix where the rows and columns both represent the 50 topics, the df added is for the data that I have, but the [25,31] would be where the +1 should go in the 50 topic matrix, so I would be able to look at the matrix and see topics that are statistically significant together othen – r8gan Jul 08 '22 at 18:08
  • Okay, now I think I understand, you can help your original post by giving us the matrix. Here's some code that could help: ``` set.seed(1) matrix(data=sample(0:100, 50*50), nrow=50, ncol=50) ``` – Jonathan Vitale Jul 08 '22 at 18:20
  • Also, it seems that your judgement.yNTA are all zero, did you intend for some of them to be 1? – Jonathan Vitale Jul 08 '22 at 18:33
  • I updated the code to include all 50 topics.. The NTA does have one 1 value, but it just depends on the set of data I am working with on if there are topics that are statistically significant in either direction... I am not concerned about the amount 1s in this data – r8gan Jul 08 '22 at 18:38

2 Answers2

0

So, what you want to do here is run a procedure in some matrix (I'll call data) for each time a value in another dataframe (df) is a specific value (in this case 1, representing "statistically significant").

Although, loops are generally not looked highly upon in R, this is one of those circumstances where they are appropriate - i.e., you are running the loop a limited number of times that does not scale with the sample size of the data.

Here's what I would do:

  • reduce the df to just those rows representing the significant data
  • create an integer version of the "Topic"
  • create nested loops running through each row of this new df
  • add ones to indices in the matrix matching the loop values

sample (assuming that you use tidyverse, df is already in my environment, but mymatrix is not):

library(tidyverse)
set.seed(1)
mymatrix <- matrix(data=sample(0:100, 50*50, replace = TRUE), nrow=50, ncol=50)

df.sig <- df %>% 
  filter(judgement.yYTA == 1) %>% 
  mutate(topic = as.numeric(gsub(".*?([0-9]+)", "\\1", response)))

for (i in 1:nrow(df.sig)) {
  for (j in 1:nrow(df.sig)) {
    # only do this where i is not j
    if (i != j) {
      x <- df.sig[i, "topic"]
      y <- df.sig[j, "topic"]
      mymatrix[x, y] <- mymatrix[x, y] + 1
    }
  }
}

  • The code ran with no errors, but did not add any values to my matrix... Also I do not want set.seed, because i want to the matrix to be completely empty before I start adding to it.. What is here is very helpful, but I am very unfamiliar with these loops, any tips to actually get it to add numbers into my matrix? Thanks! – r8gan Jul 08 '22 at 19:41
0

Here's an approach using expand.grid to find all combinations. I've initialized the matrix with 0's rather than NA's, as NA + 1 = NA.

mymatrix <- matrix(0, nrow = 50, ncol = 50)

numbers <- as.numeric(gsub("[A-Za-z]+", "", df$response[df$judgement.yYTA == 1]))
numbers_grid <- expand.grid(numbers, numbers)

for (i in 1:nrow(numbers_grid))
  mymatrix[numbers_grid$Var1[[i]], numbers_grid$Var2[[i]]] <- mymatrix[numbers_grid$Var1[[i]], numbers_grid$Var2[[i]]] + 1
harre
  • 7,081
  • 2
  • 16
  • 28