-1

I have extracted a list of words from a paragraph of text but I want to remove repeating words from that list. How can I do it?

My output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

Desired output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

  • 3
    `set()` is good for this purpose – Jussi Nurminen Oct 12 '21 at 16:24
  • Jussi's answer is correct. easiest way – LizardKingLK Oct 12 '21 at 16:31
  • Please read [How much research effort is expected of Stack Overflow users?](//meta.stackoverflow.com/a/261593/843953) At the very least, you're expected to do a quick web search before asking. Please also take the [tour], read [what's on-topic here](/help/on-topic), [ask], and the [question checklist](//meta.stackoverflow.com/q/260648/843953), and provide a [mre] showing your attempt. Welcome to Stack Overflow! – Pranav Hosangadi Oct 12 '21 at 16:42

1 Answers1

0

One way to do it is to remove list duplicates by extracting the same keys of the every list, a dict object function called "fromkeys" this function removes duplicated elements:

test_list = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
test_list = list(dict.fromkeys(test_list))
print(test_list)

Another way is that you can iterate the list over a for loop:

res = list()
for item in test_list:
    if item not in res:
        res.append(item)

A more concise version of the last approach could be as the following:

res = list()
[res.append(item) for item in test_list if item not in res]

Whole output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

Jussi's answer is also true, the set also removes list duplicates:

test_list = list(set(test_list))
  • thanks, i prefer the iteration one. It is easier to understand for me :) – Ayman Mostafa Oct 12 '21 at 17:44
  • @AymanMostafa the iteration one is the worst of all three because it is O(n^2) and the others are all O(n) – Pranav Hosangadi Oct 12 '21 at 19:09
  • @PranavHosangadi could you elaborate why it is the worst? – Ayman Mostafa Oct 13 '21 at 13:25
  • @AymanMostafa Looking for an item in a list is O(n), where n is the number of items in the list. Doing that n times makes that line O(n^2). For sets and dicts, lookup is O(1). Doing that n times makes it O(n). If you double the size of your input, your selected approach will take ~four times longer, but the others will only be ~2x. Look up time complexity – Pranav Hosangadi Oct 13 '21 at 15:36
  • despite the application complexity, what @PranavHosangadi is saying is correct, the execution time should always be considered. here's a code that proves the time difference between the codes https://github.com/ramymagdy-rm/py_playground/blob/main/remove_repeating_elements_from_list_timeit.py the time differences are: 0.015302300000000005 0.0850022 0.09027770000000002 0.011084099999999986 sec respectively. THE FIRST AND LAST APPROACHES ARE THE QUICKEST. Please notice that the one line for loop approach has been modified to a correct syntax – Ramy Ezzat Oct 25 '21 at 00:29