5

I know there are many other similar questions posted, but there is a difference in mine that makes it unsolvable with their answers.

I have several lists of characters that may have multiple consecutive spaces, of which I need to keep only one. Repetitions of any other character should remain. I did it in the following way:

myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
myList_copy = [myList[0]]

for i in range(1, len(myList):
    if not(myList[i] == ' ' and myList[i-1] == ' '):
        myList_copy.append(myList[i])

which successfully gives me

['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u', ' ']

I don't really think this is a very good, fast way to do it.

I have seen posts like this one (and others) which have similar questions. However, see that I actually need to remove only repeated spaces. Maybe what I need help with is using groupby to do this, but that's why the new post.

Thanks in advance.

AMC
  • 2,642
  • 7
  • 13
  • 35
TheSprinter
  • 338
  • 1
  • 12
  • Your data are strings, or are they really lists of strings? – AMC Apr 05 '20 at 03:23
  • You’re probably going to get awkward answers because your inputs are lists of characters. Try operating on strings instead. `re.sub` and such are much more obvious solutions then. – roippi Apr 05 '20 at 03:31
  • @AMC Well, I'm actually decoding closed captions, each character has a hexadecimal representation. I originally used a list because of other operations I had to do, but maybe this could be a good place to reconsider the convenience of doing all operations on strings. – TheSprinter Apr 05 '20 at 04:27
  • @roippi Thank you very much. Regular expressions have helped me a lot in the past. I'm considering doing all operations on strings now. See previous comment. – TheSprinter Apr 05 '20 at 04:28

6 Answers6

3

Yes,Using groupby is a good idea:

import itertools

myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
result = [key for key,group in itertools.groupby(myList)])

# ['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u']

If you want to get another elements also duplicate,you can use this:

myList = ['o', 'e', 'i', 'i' , ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
result = []
for key,group in itertools.groupby(myList):
    if key != ' ': # ' 'string
        for j in group:
            result.append(j)
    else: result.append(key)
print(result)
jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
  • The thing is that this method deletes repetitions of other characters, which should actually remain. – TheSprinter Apr 05 '20 at 04:22
  • @TheSprinter ignore your requirements.I edit my post. – jizhihaoSAMA Apr 05 '20 at 04:46
  • Okay, I have been reading about groupby since I started trying to find an answer to this. I'll try to put this in my own words to see if I understand correctly. The inner for-loop is actually going through the repetitions of each character that's not a space and appending them. When a space is found, it appends the key, which is unique. Please correct me if I'm wrong. – TheSprinter Apr 05 '20 at 04:58
  • @TheSprinter Yes,You got it. – jizhihaoSAMA Apr 05 '20 at 04:59
3

Another simple? way to do it:

  1. Join each item in the myList to create a string
  2. Split the string by whitespace
  3. Join with a space
  4. Convert the string into a list
myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']

new = list(' '.join(''.join(myList).split()))
print(new)
['o', 'e', 'i', ' ', 'l', 'k', ' ', 'j', 'u']
ywbaek
  • 2,971
  • 3
  • 9
  • 28
1

this is the same as yours but in one line

myList_copy = [myList[x] for x in range(len(myList)) if not(myList[x] == ' ' and myList[x-1] == ' ')]
The Big Kahuna
  • 2,097
  • 1
  • 6
  • 19
1

How about using numpy? Try this code.

import numpy as np
myList = ['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u']
myList = np.array(myList)
myList = [myList[0]] + list(myList[1:][~((myList[1:] == myList[:-1]) & (myList[1:] == ' '))])
print(myList)
Gilseung Ahn
  • 2,598
  • 1
  • 4
  • 11
  • Thank you very much. Please excuse my ignorance. I haven't used numpy yet. Can you explain the ~ and the & operators? – TheSprinter Apr 05 '20 at 04:44
1

You can use zip in a list comprehension to compare each character with the previous one and exclude spaces that are preceded by another space:

myList = [ c for p,c in zip([""]+myList,myList) if (p,c) != (' ',' ') ]

same approach can be used on a string

myList = [ c for p,c in zip("."+myString, myString) if (p,c) != (' ',' ') ]

but split() would probably be more concise if you have a string and want a string as output:

myString = " ".join(myString.split())
Alain T.
  • 40,517
  • 4
  • 31
  • 51
0

What about using a pandas Series and shifting the results?

import pandas as pd
serie = pd.Series(['o', 'e', 'i', ' ', ' ', ' ', 'l', 'k', ' ', ' ', ' ', ' ', ' ', 'j', 'u'])
index = ~(serie == serie.shift(1))
serie = serie[index]
jcaliz
  • 3,891
  • 2
  • 9
  • 13
  • Thank you very much. I tried it with other lists and realized it also deleted repetitions of other characters, but those should remain. Nevertheless, this would be a good moment to see how pandas can help me in other operations I'm doing. – TheSprinter Apr 05 '20 at 04:47