how to remove duplicate words from a list

Question

there is a list, it can contain such words:

list = ["word","textword","randomword"]

at the same time, this repeated word can be both in another word, and in the middle of the list or at the end

import re
list = ["word","textword","randomword"]
lst = []
for i in list:
    control_word = str(i)
    list.remove(control_word)
    for text in list:
        result = re.sub(fr"[^{control_word}]", '', text)
        lst.append(result)
print(lst)

output: ['word', 'rdoword', 'word']

i need output: ["word","text","random"]

I thought that you can check the desired word through in, and then translate the repeated word into a regular expression pattern and remove it from the string, but so far it does not work.

It's not clear to me what you want the output to be. Why would the output be "word", "rdoword" and "word"? — Simon Lundberg, Apr 25 '23 at 11:57
@SimonLundberg at the output, I want to get a list where there is no repeating word, in the example I gave the code that does not work as it should. I was asked in the last comment to add it, I added it, but it does not work at least because the pattern should be different and not delete all characters, but the whole word. — Дмитрий Насонов, Apr 25 '23 at 12:00
But `["word","textword","randomword"]` contains no repetitions. — Simon Lundberg, Apr 25 '23 at 12:09
@SimonLundberg the repetition of the word is inside the line — Дмитрий Насонов, Apr 25 '23 at 12:10
@ДмитрийНасонов You need to define more clearly what you want the output to be for different cases. For example, if list contained `"a"`, would you want it removed from all the other strings? — matszwecja, Apr 25 '23 at 12:12
Yeah, I really don't understand what you're after. Also I think your regex might not do what you think it does. You're removing all instances of any character in `text` that does not exist in `control_word`. But why? If `control_string` is `"word"` and `text` is `"random"`, you get `"rdo"`, because neither `a`, `n`, or `m` exist in `"word"`. — Simon Lundberg, Apr 25 '23 at 12:16
Does this answer your question? [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — sweenish, Apr 25 '23 at 12:20
@SimonLundberg I corrected the description of the problem, I need to get a list from the list ["word","textword","randomword"] - ["word","text","random"] — Дмитрий Насонов, Apr 25 '23 at 12:21
@sweenish thank you, I have already studied all the suggested solutions on the topic — Дмитрий Насонов, Apr 25 '23 at 12:24
Okay, a few more questions. Your code currently doesn't produce an output of the same length as the input. Is that intentional? It's sheer happenstance that your example has both input and output of three values. — Simon Lundberg, Apr 25 '23 at 12:34
@SimonLundberg The headlines are nearly identical. It's a good starting point. Finding the substrings seems like it'd require a second pass anyway. OP could refine their question. — sweenish, Apr 25 '23 at 12:39
@ДмитрийНасонов what should the output of `["word", "text","textword","randomword", "wordtext", "randomtext"]` be? Because at the moment, if we just fix your code so it does ... sort of the right thing, the output becomes `['text', 'text', 'random', 'text', 'randomtext', 'text', 'randomword', 'wordtext', 'randomtext', 'text', 'randomword', 'randomtext']` because it's looping through the list inside the loop. — Simon Lundberg, Apr 25 '23 at 13:25
@SimonLundberg in my problem only one word in the list "sticks to the rest", so Francis Bessette's answer is correct — Дмитрий Насонов, Apr 27 '23 at 14:08

Francis Bessette · Accepted Answer · 2023-04-25T13:23:24.403

-1

Here is a simple double loop that seems to work with the dataset you gave. Even if the control word isn't at index 0 this will work.

list = ["word","textword","randomword"]

for i in range(len(list)):
    for k in range(len(list)):
        if list[i] in list[k] and list[i] != list[k]:
            list[k] = list[k].replace(list[i],"")

print(list)

Output: ['word', 'text', 'random']

edited Apr 25 '23 at 13:23

answered Apr 25 '23 at 13:01

Francis Bessette

16
5

this code does not work if you move the word to the end, for example: ["textword", "randomword", "word"] – Дмитрий Насонов Apr 25 '23 at 13:17
I modified it. It will work now. – Francis Bessette Apr 25 '23 at 13:24

how to remove duplicate words from a list

1 Answers1