A pythonic way to delete successive duplicates of only one element in a list

Question

I know there are many other similar questions posted, but there is a difference in mine that makes it unsolvable with their answers.

I have several lists of characters that may have multiple consecutive spaces, of which I need to keep only one. Repetitions of any other character should remain. I did it in the following way:

myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
myList_copy = [myList[0]]

for i in range(1, len(myList):
    if not(myList[i] == ' ' and myList[i-1] == ' '):
        myList_copy.append(myList[i])

which successfully gives me

['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u', ' ']

I don't really think this is a very good, fast way to do it.

I have seen posts like this one (and others) which have similar questions. However, see that I actually need to remove only repeated spaces. Maybe what I need help with is using groupby to do this, but that's why the new post.

Thanks in advance.

You’re probably going to get awkward answers because your inputs are lists of characters. Try operating on strings instead. `re.sub` and such are much more obvious solutions then. — roippi, Apr 05 '20 at 03:31
@AMC Well, I'm actually decoding closed captions, each character has a hexadecimal representation. I originally used a list because of other operations I had to do, but maybe this could be a good place to reconsider the convenience of doing all operations on strings. — TheSprinter, Apr 05 '20 at 04:27
@roippi Thank you very much. Regular expressions have helped me a lot in the past. I'm considering doing all operations on strings now. See previous comment. — TheSprinter, Apr 05 '20 at 04:28

jizhihaoSAMA · Answer 1 · 2020-04-05T04:45:57.630

3

Yes,Using groupby is a good idea:

import itertools

myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
result = [key for key,group in itertools.groupby(myList)])

# ['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u']

If you want to get another elements also duplicate,you can use this:

myList = ['o', 'e', 'i', 'i' , ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
result = []
for key,group in itertools.groupby(myList):
    if key != ' ': # ' 'string
        for j in group:
            result.append(j)
    else: result.append(key)
print(result)

edited Apr 05 '20 at 04:45

answered Apr 05 '20 at 03:38

jizhihaoSAMA

12,336
9
27
49

The thing is that this method deletes repetitions of other characters, which should actually remain. – TheSprinter Apr 05 '20 at 04:22
@TheSprinter ignore your requirements.I edit my post. – jizhihaoSAMA Apr 05 '20 at 04:46
Okay, I have been reading about groupby since I started trying to find an answer to this. I'll try to put this in my own words to see if I understand correctly. The inner for-loop is actually going through the repetitions of each character that's not a space and appending them. When a space is found, it appends the key, which is unique. Please correct me if I'm wrong. – TheSprinter Apr 05 '20 at 04:58
@TheSprinter Yes,You got it. – jizhihaoSAMA Apr 05 '20 at 04:59

score 3 · Accepted Answer · answered Apr 05 '20 at 03:38

Another simple? way to do it:

Join each item in the myList to create a string
Split the string by whitespace
Join with a space
Convert the string into a list

myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']

new = list(' '.join(''.join(myList).split()))
print(new)

['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u']

score 1 · Answer 3 · answered Apr 05 '20 at 03:03

1

this is the same as yours but in one line

myList_copy = [myList[x] for x in range(len(myList)) if not(myList[x] == ' ' and myList[x-1] == ' ')]

answered Apr 05 '20 at 03:03

The Big Kahuna

2,097
1
6
19

Well, yes, that is indeed more pythonic. Thank you very much. – TheSprinter Apr 05 '20 at 04:48

score 1 · Answer 4 · answered Apr 05 '20 at 03:31

1

How about using numpy? Try this code.

import numpy as np
myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
myList = np.array(myList)
myList = [myList[0]] + list(myList[1:][~((myList[1:] == myList[:-1]) & (myList[1:] == ' '))])
print(myList)

answered Apr 05 '20 at 03:31

Gilseung Ahn

2,598
1
4
11

Thank you very much. Please excuse my ignorance. I haven't used numpy yet. Can you explain the ~ and the & operators? – TheSprinter Apr 05 '20 at 04:44

Alain T. · Answer 5 · 2020-04-05T05:39:50.693

You can use zip in a list comprehension to compare each character with the previous one and exclude spaces that are preceded by another space:

myList = [ c for p,c in zip([""]+myList,myList) if (p,c) != (' ',' ') ]

same approach can be used on a string

myList = [ c for p,c in zip("."+myString, myString) if (p,c) != (' ',' ') ]

but split() would probably be more concise if you have a string and want a string as output:

myString = " ".join(myString.split())

score 0 · Answer 6 · answered Apr 05 '20 at 03:04

0

What about using a pandas Series and shifting the results?

import pandas as pd
serie = pd.Series(['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u'])
index = ~(serie == serie.shift(1))
serie = serie[index]

answered Apr 05 '20 at 03:04

jcaliz

3,891
2
9
13

Thank you very much. I tried it with other lists and realized it also deleted repetitions of other characters, but those should remain. Nevertheless, this would be a good moment to see how pandas can help me in other operations I'm doing. – TheSprinter Apr 05 '20 at 04:47

A pythonic way to delete successive duplicates of only one element in a list

6 Answers6