1

I have two inputs (Text - a string, L1- List of strings to be excluded).

I have converted the 'Text' into a list and extracted each word and put it into a list using the following code:

Text=list(Text.split())

Now I have to remove the words present in the L1 list from this 'Text' list. To do so, I used the following code:

for x in Text:
        if(x in L1):
            Text.remove(x)
print(Text)

Inputs:

Text = "jack and jill went to the market to buy bread and cheese cheese is jack favorite food"

L1 = ["and","he","the","to","is"]

Desired Output:

['jack', 'jill', 'went', 'market', 'buy', 'bread', 'cheese', 'cheese', 'jack', 'favorite', 'food']

Actual Output:

['jack', 'jill', 'went', 'the', 'market', 'buy', 'bread', 'cheese', 'cheese', 'jack', 'favorite', 'food']

Please tell me why is 'the' still present in the 'Text' ?

What did I do wrong? What should I do to get my desired result?

lAaravl
  • 929
  • 2
  • 9
  • 20

5 Answers5

4

You can simply use a list comprehension like this to get desired output

Text = "jack and jill went to the market to buy bread and cheese cheese is jack favorite food"

L1 = ["and","he","the","to","is"]

Text= Text.split()

removed = [x for x in Text if x not in L1]

print(removed)

# Output : ['jack', 'jill', 'went', 'market', 'buy', 'bread', 'cheese', 'cheese', 'jack', 'favorite', 'food']

The reason your code is not working as intended is you are iterating over the list and at the same time you are altering it, which is something that should not be done.

As @blubberdiblub mentioned in the comments, this code has a time complexity of O(n*m). This can be improved to O(n+m) if we can make sure that there is no repetition in the list L1. For that use set representation of L1.

Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
  • 3
    Note that for large numbers of items in both `x` and `L1` (speaking of - say - thousands of items or so) it can be better performance-wise to get a `set()` representation of `L1` before repeatedly doing `in` checks on it. That will reduce time complexity from **O( n \* m )** to **O( n + m )**. – blubberdiblub Apr 15 '19 at 11:13
1

The reason that this isn't working is that you're modifying the list as you're iterating over it, which doesn't work, as you see. One option would be to iterate over a copy of the list, but Sreeram TP's answer is the best approach I think.

brunns
  • 2,689
  • 1
  • 13
  • 24
1

You should not mess with a list while you are iterating over that list. In here:

for x in Text:
        if(x in L1):
            Text.remove(x)
print(Text)

When you remove x from your list your for loop then tries to find the next element in Text to loop over, but one was just pulled out from under it so it ends up going one too far, and not looping as you would like. As mentioned in another post you can use a list comprehension or you could save the spots to remove for later removal:

toRemove = []
for x in Text:
        if(x in L1):
            toRemove.append(x)

for x in toRemove:
    Text.remove(x)
print(Text)

But the list comprehension way is much nicer

Hoog
  • 2,280
  • 1
  • 14
  • 20
1

The reason that your code is not working is that you are iterating over the list and at the same time making changes in the list.

Zain Arshad
  • 1,885
  • 1
  • 11
  • 26
S M Vaidhyanathan
  • 320
  • 1
  • 4
  • 13
0

Split_text= Text.split() matched= [x for x in Split_text if x not in L1] print(matched)

Yash Shukla
  • 141
  • 6