0

I have a list in a particular format as follows:

my_list =  ['apple', 'apple', 'boy', 'cat', 'cat', 'apple', 'apple', 
             'apple', 'boy', 'cat', 'cat', 'dog', 'dog'].

And my expected output is

res = ['apple', 'boy', 'cat', 'apple',  'boy', 'cat',  'dog']

The consecutive occurrence of the same word should be replaced with the word only once irrespective of whether the word occurred as another sequence earlier.

The following code when I used gives the following output.

test_list = ['apple', 'apple', 'boy', 'cat', 'cat', 'apple', 'apple', 
         'apple', 'boy', 'cat', 'cat', 'dog', 'dog'] 
res = []
[res.append(x) for x in test_list if x not in res] 
print ("The list after removing duplicates : " + str(res))

output: ['apple', 'boy', 'cat', 'dog'] - which gave only distinct words. How will I proceed from here to get what I actually require. Thanks in advance.

Epsi95
  • 8,832
  • 1
  • 16
  • 34
BiSu
  • 3
  • 1

3 Answers3

2

Use itertools.groupby

from itertools import groupby

[key for key, _ in groupby(my_list)]
['apple', 'boy', 'cat', 'apple', 'boy', 'cat', 'dog']
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Epsi95
  • 8,832
  • 1
  • 16
  • 34
  • 1
    I simplified your code - the first element of the returned tuples, which you initially ignored, is already exactly what you want (so there's no need to parse the second element). – Karl Knechtel Feb 05 '21 at 10:26
0

Use set(), which ignores duplicate values.

test_list = ['apple', 'apple', 'boy', 'cat', 'cat', 'apple', 'apple', 
         'apple', 'boy', 'cat', 'cat', 'dog', 'dog'] 
         
t = set(test_list)

Ouput :

{'apple', 'boy', 'cat', 'dog'}

If needed, you can convert the set back into a list by

list(t)

Output :

['dog', 'boy', 'apple', 'cat']
pfabri
  • 885
  • 1
  • 9
  • 25
blaze
  • 59
  • 6
0

Try this:

my_list =  ['apple', 'apple', 'boy', 'cat', 'cat', 'apple', 'apple', 
             'apple', 'boy', 'cat', 'cat', 'dog', 'dog'] + [""]
res = [my_list[i] for i in range(len(my_list) -1) if my_list[i+1] != my_list[i]] 
print(res)
dimay
  • 2,768
  • 1
  • 13
  • 22