-2

For several days i am trying to get the correctly counted output for a combination of nested ifelse and nested loops. I suppose either my nesting is totaly wrong or the way I try to count the output, maybe both.

ifelse.1 = function(input_matrix) {

  result = 1
  output = 0
  sum_output = 0

  for(i in 1:dim(input_matrix)[1]){
    for(j in 1:dim(word_list_matrix-one)[1]){
      for(k in 1:dim(word_list_matrix_two)[1]){

  ifelse(str_detect(input_matrix[i], ("word")) == TRUE 
  &  str_detect(input_matrix[i], word_list_matrix_one[j]) == TRUE
     &  str_detect(input_matrix[i], word_list_matrix_two[k]) == TRUE,
           output[i] <- output[i] + result, 
  ifelse(
     str_detect(input_matrix[i], word_list_matrix_three[j]) == TRUE
     &  str_detect(input_matrix[i], word_list_matrix_two[k]) == FALSE,
           output[i] <- output[i] + result, NA))

        sum_output = output[i]
    } # k-loop
  } # j-loop
} # i-loop
  return(sum_output)
}

The code is about detecting certain strings (via the str_detect function of the package stringr) in multiple rows of multiple one column matrices. So in the first row [i] of input_matrix the string given from row [j] in word_list_matrix should be detected.

Whenever one of the above mentioned ifelse is true, +1 should be added to the output, at the end of all i cycles the sum of the output should be returned.

Problem is I either get NA as an answer, or (for some variants of this code) I get more output counted than I gave input.

I know that ifelse should be able to compute vectors, which could lead to not needing the loops, but despite I never got that working, the matrices I have to compute are not of the same length.

I hope that I managed to deliver a good, reproducible question with enough detail. Thank you very much for your time.

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
dennis
  • 707
  • 1
  • 8
  • 12
  • Just to clarify, what you want is to detect and count the number of elements in your `input_matrix` which contain a text sequence containing words present in **all** your `word_list_matrix_N` variables. Is that correct? – Jaime Caffarel Jul 16 '16 at 12:52
  • It looks like you're trying to bump the value of `output` up by one for each `if` statement. If so, you need to refer to `output` the same each time, otherwise you're asking R to look for a different value. By trying to call it with `output[i]`, you're asking R to find a different value each time, where it looks like you want the same one to keep counting. So use `output` instead of `output[i]`. – rosscova Jul 16 '16 at 13:06
  • @JaimeCR that is correct, if a word from word_list_matrix_1 is detected, there shall also be searched a another word from word_list_matrix_2 and so on – dennis Jul 16 '16 at 14:46
  • @rosscova kind of a step forward, instead of simply giving back NA now he gives a number as it should be, but not the expected number. for a sample of data expected to deliver 3 results the function delivers an output of 15 – dennis Jul 16 '16 at 15:01

1 Answers1

0

You could use this.

one <- as.data.frame(apply(df, 2, function(x) {
    str_detect(x, paste(word_list_matrix, sep = '|', collapse = '|'))
}))

two <- as.data.frame(apply(df, 2, function(x) {
  str_detect(x, paste(word_list_matrix_two, sep = '|', collapse = '|'))
}))

three <- as.data.frame(apply(df, 2, function(x) {
  str_detect(x, paste(word_list_matrix_three, sep = '|', collapse = '|'))
}))

which(one & two & three, TRUE)

The result would be the row and column numbers of the elements in the original matrix which contain at least one word of all the three word_lists. If you wanted to check another condition, for instance if the word belongs to the lists 1 && 2 || list 3, you can change the last line accordingly, e.g.

which(one & two | three, TRUE)
Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
  • Tank you very much, this works just fine! Is there also a possibility to check for the condition one NOT two? I tried to accomplish this with a ! instead of & but It didn't work, guess it is the wring syntax – dennis Jul 17 '16 at 17:54
  • I just checked my data once again, I would really need to set NO conditions for it to work, otherwise the code will count several rows more than once. If a NO condition cannot be implemented, maybe it is possible to exclude the rows which were found as true in one & two in the | three condition from being searched in the first place – dennis Jul 17 '16 at 19:04
  • You can use the `!` operator to set NO conditions. For instance, `!(one)` gets the cells that don't contain word in the first list. You can also play around with different combinations e.g. `!(one | two) & three`. – Jaime Caffarel Jul 17 '16 at 19:52
  • Which condition regarding the three lists of words do you have to check specifically? – Jaime Caffarel Jul 17 '16 at 19:53
  • specifically i have to test which( one & two & three | one & four NOT three) – dennis Jul 17 '16 at 20:39
  • which( (one & two & three) | (one & four & !(three)))? – Jaime Caffarel Jul 17 '16 at 20:43
  • What do you mean by "four NOT three"? Does it mean that you have to look for any word in "four" and also that is not in "three"? – Jaime Caffarel Jul 18 '16 at 06:33
  • I am very sorry for taking so long for marking your answer as the correct one, everything was just fine, you helped me a lot, thank you very much, again! – dennis Jul 27 '16 at 08:07
  • I know extended discussion in a thread is to avoid, but direct messaging seems not to be an option. I encountered one more problem to solve. Would it be possible in the str_detect function in e.g. „two“ to not give the first match found, but the match which is the closest to the match found in „one“? I could also open a new question for that if it is preferable? Thought it is easier this way, because you already are familiar with the function, since it is your solution :-) Another question: Is there any way I can repay you for helping me out on this SO MUCH? – dennis Jul 29 '16 at 09:05
  • Don't worry :-) But I don't understand what you mean by "closest to the match". What would be your "similarity" function? I can't really tell if it would be possible to do with a slight modification of the code or if it would be worth asking in another question, so that other people can benefit from the question as well. Could you set an example? – Jaime Caffarel Jul 29 '16 at 17:28
  • Please excuse me for being to short in my explanation again. By "closest" I mean the distance between the two matched words in a text from df. So if in df a word from the word_list_matrix_one was found, let's say "first" and then in df were another two words found, both from word_list_matrix_two, let's say "second" and "third", out of "second" and "third" that word shall be given back, which is in df closer in distance to "first". So in a sentence from df: "second word1 word2 word3 word4 first word1 word2 third", there should be two <- "third" from world_list_matrix_two instead of "second" – dennis Jul 29 '16 at 18:52
  • Mmmm, I think this is a very different kind of problem that the code I posted (the results are words, not booleans). I'm thinking about using some kind of "distanceFunction" on each cell of the data frame. This function would have to iterate over all the word_lists and evaluate the distance between words. I can only think about extremely complicated nested loops which are obviously very inefficient and hard to debug. Could you ask this is another question? I would really like to know how other guys with much more "R-expertise" than me approach this issue. – Jaime Caffarel Jul 30 '16 at 08:11
  • I already thought that this would be that kind of a problem. I created a new post, here is the link ;-) Thank you again http://stackoverflow.com/questions/38672449/minimum-distance-function-between-strings – dennis Jul 30 '16 at 09:34
  • unfortunately, it does not seem like anyone is willing to answer to my question. Did I fail again in asking a good question? How can I improve in making people answer to my question? It is really important to me to get this working, I know, working with text it is quite difficult and much more than I could handle on my own :( – dennis Aug 01 '16 at 16:20
  • I saw the answer you get and on Sunday I was trying (unsuccessfully) to code a solution. I could try to give it another try next weekend (as a personal challenge, but I can't guarantee a solution). Probably a little bit of background on the "general problem" you're trying to solve could be valuable for getting an answer (or at least to get an alternative approach). Can you tell me more about that? Is this for some kind of problem related to language recognition? – Jaime Caffarel Aug 01 '16 at 16:41
  • I tried to make the example more reproducible, maybe it is more clear now what I am after. I was always fascinated by Opinion Mining and wanted to give it a try. What you coded detects the words just fine, but without this miniumum distance thing, it does not manage to really get the opinions right. If there is no help from other users I would be speechless that you would give up on your weekend in order so solve my problems, you are just great, mate! – dennis Aug 02 '16 at 08:25