2

I'm trying to replace all identical elements in a list with a new string, and also trying to move away from using loops for everything.

# My aim is to turn:
list = ["A", "", "", "D"]
# into:
list = ["A", "???", "???", "D"]
# but without using a for-loop

I started off with variations of comprehensions:

# e.g. 1
['' = "???"(i) for i in list]
# e.g. 2
list = [list[i] .replace '???' if ''(i) for i in range(len(lst))]

Then I tried to employ Python's map function as seen here:

list[:] = map(lambda i: "???", list)
# I couldn't work out where to add the '""' to be replaced.

Finally I butchered a third solution:

list[:] = ["???" if ''(i) else i for i in list]

I feel like I'm moving further from a sensible line of attack, I just want a tidy way to complete a simple task.

Solebay Sharp
  • 519
  • 7
  • 24
  • 1
    Does this answer your question? [In-place replacement of all occurrences of an element in a list in python](https://stackoverflow.com/questions/24201926/in-place-replacement-of-all-occurrences-of-an-element-in-a-list-in-python) – Julien Sorin Aug 20 '21 at 13:30
  • Yes, thank you, however I also got ample novel solutions to my answer, including one which used python's map function correctly. – Solebay Sharp Aug 20 '21 at 13:35
  • 2
    note: a list-comprehension _is_ in fact a for loop... – Pierre D Aug 20 '21 at 13:36
  • @PierreD is it faster or just more concise for a human to read? – Solebay Sharp Aug 20 '21 at 13:37
  • also: please don't redefine `list` as a variable. – Pierre D Aug 20 '21 at 13:37
  • "faster"? Faster than what? A list comprehension like any of the ones given in the various answers is going to be roughly the same in terms of speed, and one of the fastest (if not _the fastest_) ways of doing this operation. – Pierre D Aug 20 '21 at 13:40
  • I thought verbose 'for i in l: do thingy' loops were more tardy than comprehensions. – Solebay Sharp Aug 20 '21 at 13:42
  • commenters correctly noticed that your question is about replacing _identical_ elements in a list, but the rest of the question focuses on empty strings. Which is it? – Pierre D Aug 20 '21 at 13:42
  • Both, in this instance. I'm replacing identical elements that just so happen to empty. Might not be in future. It's not something I considered when creating the Q. Selection of correct answer is arguably subjective because of the ambiguity. – Solebay Sharp Aug 20 '21 at 13:45
  • I modified my answer to cover all cases... – Pierre D Aug 20 '21 at 13:51
  • @SolebaySharp `np.where` is the fastest solution. – Rm4n Aug 20 '21 at 15:00
  • @slamaksafari : this is simply not true. For such a short list, `%timeit [e or '???' for e in l]` gives 263 ns ± 0.373 ns per loop. Even assuming that the list is already in an `np.array` (i.e., discounting the creation of `a = np.array(l)`), `%timeit np.where(a=='', a, '???')` gives 3.03 µs ± 9.59 ns per loop (more than 10x slower). For (much) longer lists, the timings become equal to each other within a couple of percents. – Pierre D Aug 20 '21 at 17:57
  • @PierreD I've updated my post; for short lists like OP it's true, list comprehension is faster. But for long arrays this doesn't hold. – Rm4n Aug 21 '21 at 04:23
  • @siamaksafari: if you use the correct list comprehension (the one I proposed), then you'll be able to measure the 1.9x speedup against `np.array` and `np.where` that I reported. Tested up to 100 million random elements. In summary, `[e or '???' for e in data]` is between 28x faster (for short lists) to 1.9x faster (for very long lists). If you assume the list already comes in an `np.array`, then the two are equivalent for long lists within a couple of percents. – Pierre D Aug 21 '21 at 16:52
  • @PierreD This only works for empty elements, not duplicates. – Rm4n Aug 21 '21 at 19:07

5 Answers5

3

You can try this:

list1 = ["A", "", "", "D"]

list2=list(map(lambda x: "???" if not x else x,list1))

print(list2)

Here is a longer version of the above one:

list1 = ["A", "", "", "D"]
def check_string(string):
    if not string:
        return "???"
    return string

list2=list(map(check_string,list1))
print(list2)

Taking advantage of the fact that "" strings are False value, you can then use implicit booleanness and return the value respectively. Output:

['A', '???', '???', 'D']
2

For concision (if we allow list comprehensions, which are a form of loop). Also, as noted correctly by @ComteHerappait, this is to replace empty strings with '???', consistent with the examples of the question.

>>> [e or '???' for e in l]
['A', '???', '???', 'D']

If instead we focus on replacing duplicate elements, then:

seen = set()
newl = ['???' if e in seen or seen.add(e) else e for e in l]
>>> newl
['A', '', '???', 'D']

Finally, the following replaces all duplicates in a list:

from collections import Counter

c = Counter(l)
newl = [e if c[e] < 2 else '???' for e in l]
>>> newl
['A', '???', '???', 'D']
Pierre D
  • 24,012
  • 7
  • 60
  • 96
  • 1
    this works very well for removing empty strings, but I think the question is about *duplicates*. – ComteHerappait Aug 20 '21 at 13:38
  • you are correct; the question is ambiguous, see my comment. – Pierre D Aug 20 '21 at 13:42
  • Just FWIW, this updated answer responds to all the cases of the OP's question: replacement of empty strings, replacement of duplicates (starting from the first dupe), or replacement of _all_ duplicates. The list comprehension (first code snippet) is also the fastest solution so far, both for short lists and long lists. – Pierre D Aug 21 '21 at 16:48
1

You could use a list comprehension, but what you'd do is compare each element, and if its a match replace with a different string, otherwise just keep the original element.

>>> data = ["A", "", "", "D"]
>>> ['???' if i == '' else i for i in data]
['A', '???', '???', 'D']
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • That works but contains an explicit 'for' loop which is what the OP wanted to avoid –  Aug 20 '21 at 13:38
  • 1
    @DarkKnight What do you think `map` does under the hood ;) there is no solution to this problem that does *not* involve explicit or implicit looping – Cory Kramer Aug 20 '21 at 14:00
1

How about this:-

myList = ['A', '', '', 'D']
myMap = map(lambda i: '???' if i == '' else i, myList)
print(list(myMap))

...will result in:-

['A', '???', '???', 'D']

-1

If you want to avoid using loops as the title suggests, one can use np.where instead of list-comprehension, and it's faster for large arrays:

data = np.array(["A", "", "", "D"], dtype='object')
index = np.where(data == '')[0]
data[index] = "???"
data.tolist()

and the result:

['A', '???', '???', 'D']

Speed test

for rep in [1, 10, 100, 1000, 10000]:
    data = ["A", "", "", "D"] * rep
    print(f'array of length {4 * rep}')
    print('np.where:')
    %timeit data2 = np.array(data, dtype='object'); index = np.where(data2 == '')[0]; data2[index] = "???"; data2.tolist()
    print('list-comprehension:')
    %timeit ['???' if i == '' else i for i in data]

and the result:

array of length 4
np.where:
The slowest run took 11.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 10.7 µs per loop
list-comprehension:
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 487 ns per loop
array of length 40
np.where:
The slowest run took 7.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 13 µs per loop
list-comprehension:
100000 loops, best of 5: 2.99 µs per loop
array of length 400
np.where:
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 31 µs per loop
list-comprehension:
10000 loops, best of 5: 26 µs per loop
array of length 4000
np.where:
1000 loops, best of 5: 225 µs per loop
list-comprehension:
1000 loops, best of 5: 244 µs per loop
array of length 40000
np.where:
100 loops, best of 5: 2.27 ms per loop
list-comprehension:
100 loops, best of 5: 2.63 ms per loop

for arrays longer than 4000 np.where is faster.

Rm4n
  • 623
  • 7
  • 14
  • this is one of the slowest methods for short lists; For the four-element list of the OP question, it takes 7.89 µs ± 237 ns per loop, which is 23.8x slower than a simple list comprehension. For large lists (that are not yet as `np.array`), the relative difference decreases; it asymptotically stabilizes to around 1.9x slower. – Pierre D Aug 20 '21 at 18:12
  • @PierreD check out the updated post; for large arrays this method is faster – Rm4n Aug 21 '21 at 04:19
  • you used the wrong list comprehension. The one I proposed is `[e or '???' for e in data]`. That ends up at 1.9x faster than `np.where` in your loop of `%timing`: `np.where: 1.83 ms ± 1.43 µs`; `list comprehension: 959 µs ± 735 ns`. Before writing my comment, I had tested up to 100 million random elements. That's why I asserted 1.9x asymptotic speedup against `np.where`. – Pierre D Aug 21 '21 at 16:46
  • what do you mean by wrong? The list-comprehension I compared with is the solution to identical elements as the title of OP suggests (and as can be seen in other answers). Yours just works for empty elements. – Rm4n Aug 21 '21 at 19:06
  • You used `%timeit ['???' if i == '' else i for i in data]`. That replaces only empty elements, just like most of the answers here. For the case of empty elements, I suggested `[e or '' for e in data]`, which is between 28x and 1.9x faster than `np.array` and `np.where`. That's why I say you used the wrong list comprehension. As far as removing duplicates, the other parts of my answer address that. I note that it seems to be the only answer so far that does it. – Pierre D Aug 21 '21 at 19:14
  • `['???' if i == '' else i for i in data]` could be used for duplicates simply by changing `''` to desired element. Yes, your method for replacing empty elements is faster (and is nice by the way). No, your method for replacing duplicates is not the only working answer. Actually `np.where` is the only method not using loops (as the title suggests). I think my point is clear. – Rm4n Aug 22 '21 at 07:44
  • in `['a', 'b', 'c', 'b', 'a']`, how exactly do you use `np.where` to detect and remove the two duplicates? – Pierre D Aug 22 '21 at 12:26
  • That's a different problem. 1: `['', 'b', 'c', '', '']` the empty element method is the fastest. 2: `['a', 'b', 'c', 'a', 'a']` `np.where` is the fastest. 3. `['a', 'b', 'c', 'b', 'a']` this divides into two; one where all duplicates are replaced by a single element (yours does that) and one where they're replaced with distinct elements (this one hasn't been answered). I've been talking about case 2 if this wasn't obvious already. – Rm4n Aug 22 '21 at 12:54