1

I currently have the below code in Python 3.x:-

lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']

for f in file_list:
    with open(f, "r", encoding="utf-8") as file:
        content = file.read()
        if any(entry in content for entry in lst_exclusion_terms):
            print(content)

What I am aiming to do is to review the content of each file in the list file_list. When reviewing the content, I then want to check to see if any of the entries in the list lst_exclusion_terms exists. If it does, I want to remove that entry from the list.

So, if 'bob' is within the content of 2.txt, this will be removed (popped) out of the list.

I am unsure how to replace my print(content) with the command to identify the current index number for the item being examined and then remove it.

Any suggestions? Thanks

  • I had a giggle that you tagged your question `[enumerate]`. The [`enumerate()` function](https://realpython.com/python-enumerate/) is what you need. Keep in mind that you don't want to remove items from a list as you're iterating over the list though (errors happen). https://stackoverflow.com/questions/1207406/how-to-remove-items-from-a-list-while-iterating – Pranav Hosangadi May 12 '21 at 14:40
  • Thanks @PranavHosangadi. I had Googled this a bit and found enumerate and thought it was related but couldn't work out the code. With the comment you mention about errors happening, what would your suggested approach be? to add the index numbers to a temporary list and then work through this for removal? – user14637930 May 12 '21 at 14:44
  • _"would your suggested approach be?"_ See the link to the other SO question in my previous comment. – Pranav Hosangadi May 12 '21 at 14:46
  • What about `enumerate()` did you have trouble with? It works like this: `for index, value in enumerate(my_collection)`. Unsurprisingly, `index` gives you the index of the element of `my_collection` under consideration in the current iteration, and `value` gives you its value. – Pranav Hosangadi May 12 '21 at 14:48

2 Answers2

0

You want to filter a list of files based on whether they contain some piece(s) of text.

There is a Python built-in function filter which can do that. filter takes a function that returns a boolean, and an iterable (e.g. a list), and returns an iterator over the elements from the original iterable for which the function returns True.

So first you can write that function:

def contains_terms(filepath, terms):
    with open(filepath) as f:
        content = f.read()
    return any(term in content for term in terms)
        

Then use it in filter, and construct a list from the result:

file_list = list(filter(lambda f: not contains_terms(f, lst_exclusion_terms), file_list))

Of course, the lambda is required because contains_terms takes 2 arguments, and returns True if the terms are in the file, which is sort of the opposite of what you want (but sort of makes more sense from the point of view of the function itself). You could specialise the function to your use case and remove the need for the lambda.

def is_included(filepath):
    with open(filepath) as f:
        content = f.read()
    return all(term not in content for term in lst_exclusion_terms)

With this function defined, the call to filter is more concise:

file_list = list(filter(is_included, file_list))
L.Grozinger
  • 2,280
  • 1
  • 11
  • 22
0

I've had a desire like this before, where I needed to delete a list item when iterating over it. It is often suggested to just recreate a new list with the contents you wanted as suggested here

However, here is a quick and dirty approach that can remove the file from the list:

lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']
print("Before removing item:")
print(file_list)

flag = True
while flag:
    for i,f in enumerate(file_list):
        with open(f, "r", encoding="utf-8") as file:
            content = file.read()
        if any(entry in content for entry in lst_exclusion_terms):
            file_list.pop(i)
            flag = False
            break

print("After removing item")
print(file_list)

In this case, file 3.txt was removed from the list since it matched the lst_exclusion_terms

The following were the contents used in each file:

#1.txt
abcd
#2.txt
5/12/2021
#3.txt
bob
jenny
michael
kyrlon
  • 1,065
  • 2
  • 8
  • 16