0

I have 2 lists. The list named "keyword" is a list I manually created, and the nested list named "mylist" is an output of a function that I have in my script. This is what they look like:

keyword = ["Physics", "Spanish", ...]

mylist = [("Jack","Math and Physics"), 
          ("Bob","English"), 
          ("Emily","Physics"), 
          ("Mark","Gym and Spanish"),
          ("Brian", "Math and Gym"),
          ...]

What I am trying to do is delete each item in the nested list if that item (in parenthesis) contains any of the keywords written in the "keyword" list.

For example, in this case, any items in "mylist" that contain the words "Physics" or "Spanish" should be deleted from "mylist". Then, when I print "mylist", this should be the output:

[("Bob","English"), ("Brian", "Math and Gym")]

I tried searching through the internet and many different SO posts to learn how to do this (such as this), but when I modify (because I have a nested list, instead of just a list) the code and run it, I get the following error:

Traceback (most recent call last):
  File "namelist.py", line 165, in <module>
    asyncio.get_event_loop().run_until_complete(request1())
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 576, in run_until_complete
    return future.result()
  File "namelist.py", line 154, in request1
    mylist.remove(a)
ValueError: list.remove(x): x not in list

Does anyone know how to fix this error, and could you share your code?

EDIT: By the way, the real "mylist" I have on my script is much longer than what I wrote here, and I have about 15 keywords. When I run it on a small scale like this, the code works well, but as soon as I have more than 5 keywords, for some reason, I keep getting this error.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
F16Falcon
  • 395
  • 1
  • 11
  • I assume you only care if any of the words in the second item in a tuple are a keyword, but you should probably be explicit about that. – Turksarama Jul 30 '19 at 01:15

4 Answers4

4

You could join each of the tuples into a string and then check if any keyword is in the string to filter your list.

newlist = [m for m in mylist if not any(k for k in keywords if k in ' '.join(m))]

print(newlist)
# [('Bob', 'English'), ('Brian', 'Math and Gym')]
benvc
  • 14,448
  • 4
  • 33
  • 54
1
for key in keyword:
  for tup in mylist:
    [mylist.remove(tup) for i in tup if key in i]
1

You can start by splitting the fields with and and looking at intersection between the keys and the fields of each person. For instance, you could imagine something like this:

new_list = []

for name,fields in mylist:
    # Convert the string into a set of string for intersection
    field_set = set(fields.split(" and "))
    field_in_keys = field_set.intersection(keyword)

    # Add in the new list if no intersection is found
    if len(field_in_keys) == 0:
        new_list.append((name,fields))

You get:

[('Bob', 'English'), ('Brian', 'Math and Gym')]

If you care for speed, then pandas might do the work more efficiently

Nakor
  • 1,484
  • 2
  • 13
  • 23
  • This works great for my example, however, I think benvc's answer is a little bit more robust since it doesn't specifically require the word "and" to be in the list. Thanks for the answer though, upvoted! – F16Falcon Jul 30 '19 at 01:24
  • I prefer `if not field_in_keys` instead of `if len(field_in_keys) == 0` here. – RoadRunner Jul 30 '19 at 02:13
1
for x in keyword:
    for i in mylist:
        for w in i[1].split(' '):
            if w == x:
                mylist.remove(i)

If you just loop through each word I think that will work as well.

Senrab
  • 257
  • 1
  • 13