-5

I want to build a script in Python that takes a base string and runs it through a list of other strings. The script should return a list of words or phrases which are in the strings but are not in the base string.

Example:

string = 'why kid is upset'

list_of_strings = ['why my kid is upset', 'why beautiful kid is upset',
                   'why my 15 years old kid is upset', 'why my kid is always upset']

should return

['my', 'beautiful', 'my 15 years old', 'always']

Any libraries that you would suggest me to study to solve the problem?

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
Blueray
  • 3
  • 1

3 Answers3

0

You need no special libraries. Just do this:

def get_list(string, list_of_strings):
    split_list = string.split()
    return [" ".join(filter(lambda s: s not in split_list, string.split())) for string in list_of_strings)]

That may be a little hard to read, so you could split it up:

def get_list(string, list_of_strings):
    split_list = string.split()
    new_list = []
    for string in list_of_strings:
        unseen_words = filter(lambda s: s not in split_list, string.split())
        unseen_sentence = " ".join(unseen_words)
        new_list.append(unseen_sentence)
    return new_list
zondo
  • 19,901
  • 8
  • 44
  • 83
0

Update

This version adds all already seen words to the exclude set:

exclude = set('why kid is upset'.split())
list_of_strings = ['why my kid is upset', 
                   'why beautiful kid is upset', 
                   'why my 15 years old kid is upset',
                   'why my kid is always upset']
res = []
for item in list_of_strings:
    words = item.split()
    res.append(' '.join(word for word in words if word not in exclude))
    exclude.update(set(words))
print(res)

Result:

['my', 'beautiful', '15 years old', 'always']

This would work:

exclude = set('why kid is upset'.split())
list_of_strings = ['why my kid is upset', 
                   'why beautiful kid is upset', 
                   'why my 15 years old kid is upset',
                   'why my kid is always upset']
>>> [' '.join(word for word in item.split() if word not in exclude) for item
     in list_of_strings]
['my', 'beautiful', 'my 15 years old', 'my always']
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • Thanks Mike! Is there any way to improve it a bit? Assume we need to return '15 years old' (instead of 'my 15 years old') and 'always' (instead of 'my always') as we have already found 'my' from earlier strings. Would I simply need to build function that checks newly created list and returns unique values only? – Blueray Mar 11 '16 at 12:59
  • Added a version that adds already found words to the exclude set. – Mike Müller Mar 11 '16 at 13:06
  • I wish I had 1/100th of your knowledge. Appreciate your support! – Blueray Mar 11 '16 at 13:32
  • Great that it helped. BTW, you can [accept](http://stackoverflow.com/help/accepted-answer) an answer if it solves your problem. – Mike Müller Mar 11 '16 at 13:36
0

I'm not sure of the format you need when you have in list of strings something like: 'why my 15 years old kid is upset now'

Anyway, I have no lib to point out, bit this little code seems to solve your problem:

def stringNOTinbase(base,los):
    basewords = set(base.split(" ") )
    res = []
    for string in los:
        res.append( " ".join( [word for word in string.split(" ") if word not in basewords  ]   )   )
    return res

if you define the variables and call it like this:

string = 'why kid is upset'

list_of_strings = ['why my kid is upset', 'why beautiful kid is upset', 'why my 15 years old kid is upset', 'why my kid is always upset','why my 15 years old kid is upset now']

print stringNOTinbase(string,list_of_strings)

The call will output this:

['my', 'beautiful', 'my 15 years old', 'my always', 'my 15 years old now']

Explanation: I take the base string and create a "set" splitting it; then each string of the list is split into words, and the words that are not in the set are added to a new list that is then again joined with a blank space.

I hope it helps

pdepmcp
  • 136
  • 1
  • 4