2

Lets say i have a list like this:

l <- list('shoes for sell','hats for sell','suits for sell','bow ties for sell')

The common pattern is for sale (which i want to keep) and i want to remove: shoes, hats, suits and bow ties. Is there a way to do this?

My best try for now involves pmatch and table. But they do not produce what i want. Any help is appreciated!

table(unlist(l)) # just counts all the sentences one time.
pmatch('for sell',unlist(l), duplicates.ok = T)) #returns NA

The expected output will be:

[1] for sell 
[2] for sell 
[3] for sell 
[4] for sell

I need to place the patterns back into a column in my dataframe so tha positions should not change

Real data example:

list(c("Voetbalshirts", "Bedrukken"), c("Nieuwste", "Trainingspakken", 
"2017"), c("Nieuwste", "Trainingspakken", "2016"), c("Trainingspakken", 
"2016"), c("Nieuwe", "Trainingspakken", "2017"), c("Nieuwste", 
"Voetbalschoenen", "2017"), c("Voetbalschoenen", "2017"), c("Voetbalschoenen", 
"2016"), c("Nieuwste", "Voetbalschoenen", "2016"), c("Trainingskleding", 
"Kopen"), c("Trainingskleding", "Nodig?"), c("Keeper", "Handschoenen", 
"Nodig?"), c("Keeper", "Handschoenen", "Kopen"), c("Voetbalshirts", 
"met", "Eigen", "Naam?"), c("Trainingspakken", "2017"), c("Kunstgras", 
"Schoenen", "Nodig?"), c("Kunstgras", "Schoenen", "Kopen"), c("Zaalvoetbalschoenen", 
"Kopen"), c("Zaalvoetbalschoenen", "Nodig?"), c("Indoor", "Voetbalschoenen", 
"Nodig?"), c("Indoor", "Voetbalschoenen", "Kopen"), c("Goedkope", 
"Voetbalschoenen", "Kopen"), c("Voetbalschoenen", "Outlet"), 
    c("Voetbalschoenen", "met", "Sok", "Nodig?"), c("Voetbalschoenen", 
    "met", "Sok", "Kopen"), c("Voetbal", "Trainingspakken", "Kopen"
    ), c("Voetbal", "Trainingspakken", "Nodig?"), c("Trainingspakken", 
    "Kopen"), c("Voetbalpakken", "Nodig?"), c("Voetbalpakjes", 
    "Kopen"), c("Kids", "Keeper", "Handschoenen", "Nodig"), c("Kids", 
    "Keeper", "Handschoenen", "Kopen"), c("Voetbalschoenen", 
    "Online", "Kopen."), c("Voetbalschoenen", "Kopen"), c("Voetbalschoenen", 
    "Nodig?"), c("Trainingspakken", "Nodig?"), c("Voetbalpakken", 
    "Kopen"), c("Voetbalshirts", "Nodig?"), c("Voetbalshirts", 
    "Kopen"), c("Voetbalpakjes", "Nodig?"), c("Adidas", "Voetbalschoenen", 
    "Nodig?"), c("Adidas", "Voetbalschoenen", "Kopen"), c("Nike", 
    "Voetbalschoenen", "Kopen"), c("Nike", "Voetbalschoenen", 
    "Nodig?"))

Or

> dput(l)
list("Voetbalshirts Bedrukken", "Nieuwste Trainingspakken 2017", 
    "Nieuwste Trainingspakken 2016", "Trainingspakken 2016", 
    "Nieuwe Trainingspakken 2017", "Nieuwste Voetbalschoenen 2017", 
    "Voetbalschoenen 2017", "Voetbalschoenen 2016", "Nieuwste Voetbalschoenen 2016", 
    "Trainingskleding Kopen", "Trainingskleding Nodig?", "Keeper Handschoenen Nodig?", 
    "Keeper Handschoenen Kopen", "Voetbalshirts met Eigen Naam?", 
    "Trainingspakken 2017", "Kunstgras Schoenen Nodig?", "Kunstgras Schoenen Kopen", 
    "Zaalvoetbalschoenen Kopen", "Zaalvoetbalschoenen Nodig?", 
    "Indoor Voetbalschoenen Nodig?", "Indoor Voetbalschoenen Kopen", 
    "Goedkope Voetbalschoenen Kopen", "Voetbalschoenen Outlet", 
    "Voetbalschoenen met Sok Nodig?", "Voetbalschoenen met Sok Kopen", 
    "Voetbal Trainingspakken Kopen", "Voetbal Trainingspakken Nodig?", 
    "Trainingspakken Kopen", "Voetbalpakken Nodig?", "Voetbalpakjes Kopen", 
    "Kids Keeper Handschoenen Nodig", "Kids Keeper Handschoenen Kopen", 
    "Voetbalschoenen Online Kopen.", "Voetbalschoenen Kopen", 
    "Voetbalschoenen Nodig?", "Trainingspakken Nodig?", "Voetbalpakken Kopen", 
    "Voetbalshirts Nodig?", "Voetbalshirts Kopen", "Voetbalpakjes Nodig?", 
    "Adidas Voetbalschoenen Nodig?", "Adidas Voetbalschoenen Kopen", 
    "Nike Voetbalschoenen Kopen", "Nike Voetbalschoenen Nodig?")
Sander Van der Zeeuw
  • 1,092
  • 1
  • 13
  • 35

2 Answers2

2

If I have understood you correctly, I think you need to first find out the common words in all the elements of list (unlike hard coding for sell)

l <- list('shoes for sell','hats for sell','suits for sell','bow ties for sell')

Splitting every list element into words

lst <- sapply(l, function(x) strsplit(x, " "))

Finding out common words from all the lists

Reduce(intersect, lst)

#[1] "for"  "sell"

Now, if you want for sell to repeat for every element in the list

rep(paste0(Reduce(intersect, lst), collapse = " "), length(l))

#[1] "for sell" "for sell" "for sell" "for sell"

or you can use any of the functions (str_extract, str_match) to extract the common words from the list.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • For me this does not work. I get a empty vector. Maybe my list is just to complex and there are to many similarities. To do this properly. [[41]] [1] "New shoes Needed?" [[42]] [1] "Buy very special shoes" [[43]] [1] " Buy special shoes" [[44]] [1] "special shoes Needed?" Here i want to find buy and needed. Any idea how to obtain that? – Sander Van der Zeeuw Nov 03 '16 at 09:12
  • It works for the example given in the question. Maybe you can share an example which represents your original data ? – Ronak Shah Nov 03 '16 at 09:15
  • I just added 2 different representations of the data. One where the sentences are split already and 1 which is just the sentences in a list. The idea is to find nodig, kopen, bedrukken. I know the word voetbalschoenen is also present a lot, but these i want to remove. A count for example is already enough. Then i will decide manually. But preferable i want to get it to work programatically. Thanks! – Sander Van der Zeeuw Nov 03 '16 at 09:48
  • @SanderVanderZeeuw what output do you expect for those 2 lists ? I could not find any common words in those lists. – Ronak Shah Nov 03 '16 at 11:20
  • for example: Nodig, Kopen. Basically i want to remove brands, years and stuff. So that i can categorize the vectors. Stuff like trainingspakken, trainingskleding, handschoenen should be removed. – Sander Van der Zeeuw Nov 03 '16 at 11:25
  • Ohh..I see. However, that wasn't the original question. If you see in the example you provided we had `for sell` as common words in all of the elements in the list, hence it was extracted. You might want to add it as a new question with a reference to this question. – Ronak Shah Nov 03 '16 at 11:37
0

You also use function str_match_all for the same

unlist(str_match_all(l,"for sell"))
[1] "for sell" "for sell" "for sell" "for sell"
Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22