Nested List Iteration

Question

I was attempting some preprocessing on nested list before attempting a small word2vec and encounter an issue as follow:

corpus = ['he is a brave king', 'she is a kind queen', 'he is a young boy', 'she is a gentle girl']

corpus = [_.split(' ') for _ in corpus]

[['he', 'is', 'a', 'brave', 'king'], ['she', 'is', 'a', 'kind', 'queen'], ['he', 'is', 'a', 'young', 'boy'], ['she', 'is', 'a', 'gentle', 'girl']]

So the output above was given as a nested list & I intended to remove the stopwords e.g. 'is', 'a'.

for _ in range(0, len(corpus)):
     for x in corpus[_]:
         if x == 'is' or x == 'a':
             corpus[_].remove(x)

[['he', 'a', 'brave', 'king'], ['she', 'a', 'kind', 'queen'], ['he', 'a', 'young', 'boy'], ['she', 'a', 'gentle', 'girl']]

The output seems indicating that the loop skipped to the next sub-list after removing 'is' in each sub-list instead of iterating entirely.

What is the reasoning behind this? Index? If so, how to resolve assuming I'd like to retain the nested structure.

Sheldore · Accepted Answer · 2019-01-04T15:26:08.210

All you code is correct except a minor change: Use [:] to iterate over the contents using a copy of the list and avoid doing changes via reference to the original list. Specifically, you create a copy of a list as lst_copy = lst[:]. This is one way to copy among several others (see here for comprehensive ways). When you iterate through the original list and modify the list by removing items, the counter creates the problem which you observe.

for _ in range(0, len(corpus)):
     for x in corpus[_][:]: # <--- create a copy of the list using [:]
         if x == 'is' or x == 'a':
             corpus[_].remove(x)

OUTPUT

[['he', 'brave', 'king'],
 ['she', 'kind', 'queen'],
 ['he', 'young', 'boy'],
 ['she', 'gentle', 'girl']]

iGian · Answer 2 · 2019-01-04T22:36:09.837

Maybe you can define a custom method to reject elements matching a certain condition. Similar to itertools (for example: itertools.dropwhile).

def reject_if(predicate, iterable):
  for element in iterable:
    if not predicate(element):
      yield element

Once you have the method in place, you can use this way:

stopwords = ['is', 'and', 'a']
[ list(reject_if(lambda x: x in stopwords, ary)) for ary in corpus ]
#=> [['he', 'brave', 'king'], ['she', 'kind', 'queen'], ['he', 'young', 'boy'], ['she', 'gentle', 'girl']]

score 0 · Answer 3 · answered Jan 04 '19 at 15:29

0

nested = [input()]

nested = [i.split() for i in nested]

answered Jan 04 '19 at 15:29

Mohit Singh

1
1

While this code snippet may solve the question, [including an explanation](//s.tk/meta/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – Filnor Jan 04 '19 at 15:39

Nested List Iteration

3 Answers3