Unable to append list into another list. Every-time I append to it, the previous list inside it is erased

Question

Please excuse my noobness. I have a list of lists:

print(tokens)
[['What', "'s", 'my', 'name', '?'], ['My', 'name', 'is', 'Aditya', '.'], ['My', 'name', 'is', 'Glen'], 
['My', 'name', 'is', 'Kenta', '.'], ['My', 'name', 'is', 'Keita'], ['My', 'name', 'is', 'Ganchan'], 
['My', 'name', 'is', 'Anna', '.'], ['My', 'name', 'is', 'Tho'], ['My', 'name', 'is', 'Joe', '.']]

What I am trying to do is remove all of the stop-words which are given in the default stop-words corpus in Python NLTK library which I have downloaded and imported:

stop_words = set(stopwords.words('english'))

For this, I used a nested for loop to open up the list and I am trying to match them with the stopwords. However, when I try to wrap it back up into a nested list, it only takes in the last list.

The code:

filtered_tokens = []
filtered_tokens_list = []

for token in tokens:
    filtered_tokens.clear()
    for t in token:
        if t.upper() not in (name.upper() for name in stop_words):
            filtered_tokens.append(t)
    filtered_tokens_list.append(filtered_tokens)

filtered_tokens_list

The output:

[['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.'],
 ['name', 'Joe', '.']]

I tried to see how filtered_tokens_list looks at each iteration by printing it out at each iteration

for token in tokens:
    filtered_tokens.clear()
    for t in token:
        if t.upper() not in (name.upper() for name in stop_words):
            filtered_tokens.append(t)
    filtered_tokens_dict.append(filtered_tokens)
    print(filtered_tokens_dict)

And the output is:

[["'s", 'name', '?']]
[['name', 'Aditya', '.'], ['name', 'Aditya', '.']]
[['name', 'Glen'], ['name', 'Glen'], ['name', 'Glen']]
[['name', 'Kenta', '.'], ['name', 'Kenta', '.'], ['name', 'Kenta', '.'], ['name', 'Kenta', '.']]
[['name', 'Keita'], ['name', 'Keita'], ['name', 'Keita'], ['name', 'Keita'], ['name', 'Keita']]
[['name', 'Ganchan'], ['name', 'Ganchan'], ['name', 'Ganchan'], ['name', 'Ganchan'], ['name', 'Ganchan'], ['name', 'Ganchan']]
[['name', 'Anna', '.'], ['name', 'Anna', '.'], ['name', 'Anna', '.'], ['name', 'Anna', '.'], ['name', 'Anna', '.'], ['name', 'Anna', '.'], ['name', 'Anna', '.']]
[['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho'], ['name', 'Tho']]
[['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.'], ['name', 'Joe', '.']]

For some reason, the whole list is getting written over by the latest contents in filtered_tokens

The output I am looking for is:

[["'s", 'name', '?'],['name', 'Aditya', '.'],['name', 'Glen'],['name', 'Kenta', '.'],['name', 'Keita'],
['name', 'Ganchan'],['name', 'Anna', '.'],['name', 'Tho'],['name', 'Joe', '.']]

It's quite baffling and I haven't seen anything like this online. Would really appreciate the help!

`filtered_tokens_dict` is being filled with references to the *same* `filtered_tokens` object. You keep mutating it, all references will reflect the mutation. — Paritosh Singh, Jan 24 '20 at 08:45
I am not trying to copy the list. I am trying to append the list that is created after each iteration into a list of lists — Aditya Hariharan, Jan 24 '20 at 08:54
@AdityaHariharan there's a misconception you are making, no list is "created" after each iteration. you are simply calling `clear` on the same list every time. You likely do want to make a copy to achieve your desired result — Hymns For Disco, Jan 24 '20 at 08:56
@AdityaHariharan exactly, *because* you're not copying the list while appending it, you're running into the issue you see here. They are all references to the *same* list. try `filtered_tokens_dict.append(filtered_tokens.copy())` — Paritosh Singh, Jan 24 '20 at 08:56
Alternatively, use a new list every time `filtered_tokens = []` instead of `filtered_tokens.clear()` — Paritosh Singh, Jan 24 '20 at 08:59
@ParitoshSingh understood, thanks for clearing that up for me! — Aditya Hariharan, Jan 24 '20 at 09:05
@ParitoshSingh which of the above two do you think is a more efficient way of doing it? — Aditya Hariharan, Jan 24 '20 at 09:07
@AdityaHariharan the latter in this case. `filtered_tokens = []` is simple, clean, and more efficient (because it doesn't actually require copying the list. You just create a new empty list instead, that's pretty fast) — Paritosh Singh, Jan 24 '20 at 09:08

Unable to append list into another list. Every-time I append to it, the previous list inside it is erased

0 Answers0