-3

I want to compare word by word whether the word exists in the list.

items=["michael jackson","nintendo", "michael jackson"]
aa = ["i think michael jackson is cool","i love nintendo","i miss jackson nintendo"]


for i, a in zip(items, aa):
    token=a.split()

    for x in token:

        if x in i:
            print "X: " + x

Output:

X: i
X: michael
X: jackson
X: i
X: nintendo
X: i
X: jackson

Expected output:

X: michael X: jackson #from "i think michael jackson is cool"
X: nintendo #from i love nintendo"
X: jackson #from "i miss jackson nintendo"

As you can see, i is also printed out because i is in michael and nintendo, but I do not want that. Note that I want to compare words by words in items and aa, by comparing the items in the same index.

As for the 3rd item in items, it will print out jackson although only jackson is present when comparing with michael jackson.

Note that the result for the 3rd item in aa should only be "jackson" but not "jackson", "nintendo" because the lists should be compared within the same index. the 3rd index in items is "michael jackson", there is no "nintendo"m hence the result should only be "jackson".

The first item matches "michael" and "jackson" so i would like the result to print them in one line so that the indexes in the results correspond to the index of items. Because if i proceeded with the original expected results, i realised that the order of the item in "aa" will be lost. Meaning to say, I would not know that "michael jackson" is extracted from the first item in "aa"

Lily
  • 55
  • 1
  • 8
  • Why do you want to compare "michael" and "jackson" separately? – cs95 Sep 14 '18 at 17:58
  • I believe [this](https://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-substring-method) is what you're looking for. – Woody1193 Sep 14 '18 at 17:59
  • I'll be applying this with the Stanford dependency parser, which will give me output of one word by one word. hence I have to compare individually – Lily Sep 14 '18 at 18:00
  • 1
    @Lily In that case, please modify your question to make this clear – Woody1193 Sep 14 '18 at 18:01

3 Answers3

2

Its very simple. Try:

items=["michael jackson","nintendo", "michael jackson"]
aa = ["i think michael jackson is cool","i love nintendo","i miss jackson nintendo"]

output,string = [],""
for xx,yy in zip(aa,items):
    item = yy.split(" ")
    for x in xx.split(" "):
        if x in item:
            string += "X: " + x + " "
    if string != "":output.append(string)
    string= ""
#printing data
for item in output:print item 

Output (Expected):

X: michael X: jackson 
X: nintendo 
X: jackson 
Nouman
  • 6,947
  • 7
  • 32
  • 60
  • 2
    This code doesn't even run... – G_M Sep 14 '18 at 18:01
  • @G_M It does - in Python 2.7. To the OP: you advise not to use `split`, but use it yourself nonetheless? – DYZ Sep 14 '18 at 18:04
  • @Black Thunder i get this error by running your code. 'list' object has no attribute 'split' . However, I must zip it together because my real data are much larger than this, and there will have duplicates in the "items" list. If I do not zip the lists, they won't compare by the same index – Lily Sep 14 '18 at 18:04
  • 1
    @DYZ The answer was edited about three times after my comment. – G_M Sep 14 '18 at 18:04
  • @BlackThunder Initially, i thought this is the solution but this does not compare the items within the same index. Because when I changed the 3rd item in aa to "i love jackson nintendo", the result will also print out "nintendo", though nintendo is not in the 3rd item in items – Lily Sep 14 '18 at 18:29
  • @BlackThunder I've just updated my question too. Your answer allowed me to add more explanation to my question. Thank you – Lily Sep 14 '18 at 18:33
  • @BlackThunder I've tried it, and for the small data as in the examples, it worked. I will try it on my bigger data alongside with stanford parser which makes it more complicated with more for loops to compare. But i'll see how it goes. Thank you – Lily Sep 14 '18 at 18:46
  • @BlackThunder I'm so sorry, It was careless of me but is it possible to get results something like, X: michael X: jackson X: nintendo X: jackson meaning to say that the if two words are found in the items, they should print in the same line instead of separating them. I've just edited my question again. Should I leave this question as it was before and open a new question instead? – Lily Sep 14 '18 at 18:58
  • @Lily I will recommend to edit the same question and add this condition. Also explain some more in the question. Please give some more explanation, – Nouman Sep 14 '18 at 19:00
  • @BlackThunder Alright, I've just added more explanation to it. – Lily Sep 14 '18 at 19:06
  • @Lily Please check now. – Nouman Sep 14 '18 at 19:11
  • @BlackThunder Thank you. I'll try to apply this in my code :) – Lily Sep 14 '18 at 19:27
1

Is the order of output words within each pair of phrases important? In other words, should the first 'michael' and 'jackson' appear in this order? If it is important, this solution works, though it is not very efficient:

from itertools import chain
list(chain.from_iterable([x for x in s1.split() for y in s2.split() if x==y] 
                         for s1,s2 in zip(aa, items)))
#['michael', 'jackson', 'nintendo', 'jackson']

If the order is not important, you can calculate set intersections:

list(chain.from_iterable(set(s1.split()) & set(s2.split()) 
                         for s1,s2 in zip(aa, items)))
#['jackson', 'michael', 'nintendo', 'jackson']

The second solution is about 20% faster.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • the order is not important. I've tried the first solution, and it seems to give me the results that I wanted. But can you please explain to me in the normal for-loop, instead of list comprehension. Because I will be dealing with more for-loops in my actual data (standford parser) , hence i wish to understand more of your solution. I'm weak in list comprehension. @DYZ – Lily Sep 14 '18 at 18:37
0

Even easier. Iterate both lists, then iterate the iteration variable :)

l1=["michael jackson","nintendo", "michael jackson"] 
l2 = ["i think michael jackson is cool","i love nintendo","i miss jackson nintendo"] 
for x in l1:
    for y in x.split():
        if y in x:print('X: '+y)
Swift
  • 1,663
  • 1
  • 10
  • 21
  • 2
    How can i compare the items in the lists by using this? – Lily Sep 14 '18 at 18:11
  • Compare the y values. Y is each list item in both lists iteratively. – Swift Sep 14 '18 at 18:35
  • @Lily I have amended my answer to be more precise for you. – Swift Sep 14 '18 at 18:37
  • I've tried the new code, but it is still not what I wanted. using "in" is the problem in my original solution too. because when i tested out your code with "if 't' in y", my expected result is " " (nothing), because in your lists, there is no "t". but your solution returns "test", "testing", "tester", "tested", because "t" is simply present in all – Lily Sep 14 '18 at 18:41
  • Hmm. I know another way but it's not pretty lol. I will double check my method. Sorry to waste your time like that. – Swift Sep 14 '18 at 18:42
  • No worries. Instead, Thank you for your time to try to help me solve my question :) – Lily Sep 14 '18 at 18:43
  • I just tested the code like you said with 't' and it works fine? – Swift Sep 14 '18 at 18:46
  • What do you get for your output? The output i'm expecting from your given example is supposed to be " " – Lily Sep 14 '18 at 18:51
  • Last time editing. Here it is without zip. Take my brain power, FOC ;) – Swift Sep 14 '18 at 18:57