1

I'm creating a matrix of 1s and 0s. It is 1 if a word is part of a string, 0 otherwise.

For example the expected matrix would be something as follow:

                           white hanging heart holder black suitcase
white hanging heart holder     1       1     1      1     0        0
black suitcase                 0       0     0      0     1        1

What I have at disposal are the 2 vectors:

Itemsvector = c("white hanging heart holder","black suitcase", ...)
Wordsvector = c("white","hanging","heart","holder","black", "suitcase",...)

I'm toying around the use of %in% operator

strsplit(Itemsvector[1], split = ' ')[[1]] %in% Wordsvector

Also

grepl(Wordsvector[1], Itemsvector)

Which does give me the TRUE and FALSE value, though I'm at lost to map this set of values to the whole matrix grid.

Afiq Johari
  • 1,372
  • 1
  • 15
  • 28
  • 1
    Please give a reproducible example, one that doesn't depend upon an embedded image. – John Coleman May 12 '19 at 12:10
  • I can't seem to be able to format the data, so ended up taking a screenshot instead. – Afiq Johari May 12 '19 at 13:33
  • 1
    See [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/4996248). A good R question should have the property that a reader can copy-paste what you provide in the question and have it exactly reproduce the issue that you are facing. You could also read about the importance of having a [mcve] on Stack Overflow. – John Coleman May 12 '19 at 13:37
  • 1
    I format the required data presentation as a code, looks fine now :) – Afiq Johari May 12 '19 at 13:41

2 Answers2

2

We can do this much easier with table after splitting the 'Itemsvector' into a list of vectors, stack it to a data.frame and use the table

table(stack(setNames(strsplit(Itemsvector, " "), Itemsvector))[2:1])
#                             values
#ind                          black hanging heart holder suitcase white
#  white hanging heart holder     0       1     1      1        0     1
#  black suitcase                 1       0     0      0        1     0

Or with mtabulate

library(qdapTools)
mtabulate(setNames(strsplit(Itemsvector, " "), Itemsvector))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You could try using double sapply and since you already have Wordsvector to search for no need to split Itemsvector again. We can find if a particular word is present or not in particular Itemsvector using grepl and for extra precaution we add word boundaries so that it doesn't match "white" with "whites".

+(t(sapply(Itemsvector, function(x) sapply(Wordsvector, function(y) 
                                  grepl(paste0("\\b",y, "\\b"), x)))))

#                           white hanging heart holder black suitcase
#white hanging heart holder     1       1     1      1     0        0
#black suitcase                 0       0     0      0     1        1

data

Itemsvector = c("white hanging heart holder","black suitcase")
Wordsvector = c("white","hanging","heart","holder","black", "suitcase")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks! I used grepl with a for loop to loop each word, but yours is much more concise :) – Afiq Johari May 12 '19 at 13:32
  • Could you please clarify the use of "\\b"? I don't quite understand why it's useful. Thanks – Afiq Johari May 12 '19 at 13:46
  • @AfiqJohari Since we are using `grepl` here it matches the pattern in the string. Here we want to match the exact words. Check the difference in output for `grepl("white", c("white", "black", "whites"))` and `grepl("\\bwhite\\b", c("white", "black", "whites"))` Notice how in first case `white` is matched with `whites` (which we don't want) and not in the second case. Hence, we add `\\b` to avoid such unexpected matching of pattern. – Ronak Shah May 12 '19 at 13:54
  • Cool, took note of this. Thanks again Ronak – Afiq Johari May 12 '19 at 14:07