-2

Given a search string input, I'd like to break it up into a group of possible matching groups. For consecutive terms though, such as for new search, I would like to keep the original spacing.

Is there a library such as itertools that can give all the consecutive combinations of words, for example:

INPUT ==> "new    search words"
OUTPUT ==> ['new', 'search', 'words', 'new    search', 'new    search words', 'search words']

Note that I'm not looking to get a combination of all possible letters. For example:

>>> list(itertools.combinations(s, 1))
[('O',), ('n',), ('c',), ('e',), (' ',), ('u',), ('p',), ('o',), ('n',), (' ',), ('a',), (' ',), ('t',), ('i',), ('m',), ('e',), (' ',), ('i',), ('n',), (' ',), ('t',), ('h',), ('e',), (' ',), ('w',), ('e',), ('s',), ('t',), (' ',), ('U',), ('S',)]

I'm only looking for the possible word combinations, of which there are 6 (3!).

David542
  • 104,438
  • 178
  • 489
  • 842
  • The same question has been asked. Check answer from [here](https://stackoverflow.com/a/5898031/2000230) – honglei Aug 18 '19 at 06:42
  • @honglei not really -- that's based more on letters (from what I can understand) – David542 Aug 18 '19 at 06:44
  • _Why_ do you want to do this? – eddiewould Aug 18 '19 at 06:47
  • @eddiewould if someone enters a search term -- whatever it may be -- I need to be able to match it exactly, and to be able to search on words themselves. – David542 Aug 18 '19 at 06:49
  • Why is the _whitespace_ significant? – eddiewould Aug 18 '19 at 06:50
  • At any rate, I guess your solution will involve splitting on a single space character, then some wacky code going over those tokens to create a new sequence of tokens (with your whitespace rules applied) finally putting the sequence of whitespace-aware tokens into a combinations function. – eddiewould Aug 18 '19 at 06:51

4 Answers4

1

Use a combination of itertools.combinations and itertools.chain on a list of the words:

itertools.chain(*(itertools.combinations(words, i) for i in range(1, len(words)+1)))

In your case you can find the words using your_input.split()

donkopotamus
  • 22,114
  • 2
  • 48
  • 60
  • That gives me thousands of combinations based on the letters, not just the words. – David542 Aug 18 '19 at 06:44
  • sort of. That gives all possible combinations (not just the ones "in order") and doesn't respect spacing if using `.split()`. – David542 Aug 18 '19 at 06:50
  • @David542 Surely it respects order, it’s using combinations not permutations ... if you care about the spacing (which seems odd for your search engine context) simply split them in a way that respects whitespace (using a word boundary regex for example) – donkopotamus Aug 18 '19 at 06:57
0

permutations seems to be the answer you are looking for:

  def f(words):
     from itertools import permutations
     l1 = list(permutations(words))
     l2 = []
     for i in l1:
        l2.append( ''.join(i) )
     return l2

  a = ['a', 'b', 'c']
  print( f(a) )

Note that the function is taking in an argument as a list. Just split your string at the head of the function or before setting the param.

T.Woody
  • 1,142
  • 2
  • 11
  • 25
0

Try This:`

from itertools import combinations

str = "new    search words"

splited = str.split(" ")

output = sum([list(map(list, combinations(splited, i))) for i in range(len(splited) + 1)], [])
print(output)

`

0

if you are not concerned about the order you could use accumulate:

from itertools import accumulate

lst = ["new", "search", "words"]

ret = []
for start in range(len(lst)):
    n = list(accumulate(lst[start:], func=lambda x, y: f"{x} {y}"))
    ret.extend(n)
print(ret)
# ['new', 'new search', 'new search words', 'search', 'search words', 'words']
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111