Duplicate strings in a list and add integer suffixes to newly added ones

Question

Suppose I have a list:

l = ['a', 'b', 'c']

And its suffix list:

l2 = ['a_1', 'b_1', 'c_1']

I'd like the desired output to be:

out_l = ['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

The result is the interleaved version of the two lists above.

I can write regular for loop to get this done, but I'm wondering if there's a more Pythonic way (e.g., using list comprehension or lambda) to get it done.

I've tried something like this:

list(map(lambda x: x[1]+'_'+str(x[0]+1), enumerate(a)))
# this only returns ['a_1', 'b_2', 'c_3']

Furthermore, what changes would need to be made for the general case i.e., for 2 or more lists where l2 is not necessarily a derivative of l?

related: [Interleaving two lists in Python](https://stackoverflow.com/q/7946798/4279) and [Most pythonic way to interleave two strings](https://stackoverflow.com/q/34756145/4279) — jfs, May 24 '18 at 20:34

cs95 · Accepted Answer · 2019-06-27T15:43:26.347

`yield`

You can use a generator for an elegant solution. At each iteration, yield twice—once with the original element, and once with the element with the added suffix.

The generator will need to be exhausted; that can be done by tacking on a list call at the end.

def transform(l):
    for i, x in enumerate(l, 1):
        yield x
        yield f'{x}_{i}'  # {}_{}'.format(x, i)

You can also re-write this using the yield from syntax for generator delegation:

def transform(l):
    for i, x in enumerate(l, 1):
        yield from (x, f'{x}_{i}') # (x, {}_{}'.format(x, i))

out_l = list(transform(l))
print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

If you're on versions older than python-3.6, replace f'{x}_{i}' with '{}_{}'.format(x, i).

Generalising
Consider a general scenario where you have N lists of the form:

l1 = [v11, v12, ...]
l2 = [v21, v22, ...]
l3 = [v31, v32, ...]
...

Which you would like to interleave. These lists are not necessarily derived from each other.

To handle interleaving operations with these N lists, you'll need to iterate over pairs:

def transformN(*args):
    for vals in zip(*args):
        yield from vals

out_l = transformN(l1, l2, l3, ...)

Sliced `list.setitem`

I'd recommend this from the perspective of performance. First allocate space for an empty list, and then assign list items to their appropriate positions using sliced list assignment. l goes into even indexes, and l' (l modified) goes into odd indexes.

out_l = [None] * (len(l) * 2)
out_l[::2] = l
out_l[1::2] = [f'{x}_{i}' for i, x in enumerate(l, 1)]  # [{}_{}'.format(x, i) ...]

print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

This is consistently the fastest from my timings (below).

Generalising
To handle N lists, iteratively assign to slices.

list_of_lists = [l1, l2, ...]

out_l = [None] * len(list_of_lists[0]) * len(list_of_lists)
for i, l in enumerate(list_of_lists):
    out_l[i::2] = l

`zip` + `chain.from_iterable`

A functional approach, similar to @chrisz' solution. Construct pairs using zip and then flatten it using itertools.chain.

from itertools import chain
# [{}_{}'.format(x, i) ...]
out_l = list(chain.from_iterable(zip(l, [f'{x}_{i}' for i, x in enumerate(l, 1)])))

print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

iterools.chain is widely regarded as the pythonic list flattening approach.

Generalising
This is the simplest solution to generalise, and I suspect the most efficient for multiple lists when N is large.

list_of_lists = [l1, l2, ...]
out_l = list(chain.from_iterable(zip(*list_of_lists)))

Performance

Let's take a look at some perf-tests for the simple case of two lists (one list with its suffix). General cases will not be tested since the results widely vary with by data.

Benchmarking code, for reference.

Functions

def cs1(l):
    def _cs1(l):
        for i, x in enumerate(l, 1):
            yield x
            yield f'{x}_{i}'

    return list(_cs1(l))

def cs2(l):
    out_l = [None] * (len(l) * 2)
    out_l[::2] = l
    out_l[1::2] = [f'{x}_{i}' for i, x in enumerate(l, 1)]

    return out_l

def cs3(l):
    return list(chain.from_iterable(
        zip(l, [f'{x}_{i}' for i, x in enumerate(l, 1)])))

def ajax(l):
    return [
        i for b in [[a, '{}_{}'.format(a, i)] 
        for i, a in enumerate(l, start=1)] 
        for i in b
    ]

def ajax_cs0(l):
    # suggested improvement to ajax solution
    return [j for i, a in enumerate(l, 1) for j in [a, '{}_{}'.format(a, i)]]

def chrisz(l):
    return [
        val 
        for pair in zip(l, [f'{k}_{j+1}' for j, k in enumerate(l)]) 
        for val in pair
    ]

I'd recommend the `yield` from the perspective of readability, simplicity, and maintenance, as it's unlikely this is going to be a major bottleneck. (Probably not high enough volume of data, probably not a performance critical app.) The generator is *extraordinarily* straightforward to understand. OP can go back and optimize if it turns out to be a problem. +1 — jpmc26, May 13 '18 at 11:45
@user1717828 I'm happy you learned something from this! They are called f-strings and are introduced for python-3.6+. Do take a look at [this section of the docs](https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals) for more info. Happy learning! — cs95, May 13 '18 at 17:03
I don't understand the why `yield from`. Could you add more explanation for that please? — Tjorriemorrie, May 22 '18 at 23:22
`yield from` provides a slightly more compact syntax to do the same thing that two `yield` statements do - it _delegates_ the yield process, so you don't need to write a loop over an iterable (or two yield statements as in this case). — cs95, May 23 '18 at 01:59
@cs95 The performance comparison is biased since `ajax1234` and `cs0` use `str.format` while other functions use f-strings which are considerably faster (`sruthiV` even uses `+`). So effectively the performance of these functions is degraded by using a less performant formatting option. In order to provide a meaningful comparison the functions need to be updated to use the same formatting option. Also `sruthiV` should use `i//2` instead of `int(i/2)` as it is much more efficient (hence avoiding additional bias). — a_guest, Jun 03 '19 at 22:27
@a_guest The answers have been timed as presented. The graph here captures the performance difference between the original answers, formatting choices and all. — cs95, Jun 03 '19 at 22:36

Ajax1234 · Answer 2 · 2018-05-13T03:00:52.170

6

You can use a list comprehension like so:

l=['a','b','c']
new_l = [i for b in [[a, '{}_{}'.format(a, i)] for i, a in enumerate(l, start=1)] for i in b]

Output:

['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

Optional, shorter method:

[j for i, a in enumerate(l, 1) for j in [a, '{}_{}'.format(a, i)]]

edited May 13 '18 at 03:00

answered May 13 '18 at 02:47

Ajax1234

69,937
8
61
102

score 5 · Answer 3 · edited May 13 '18 at 03:51

5

You could use zip:

[val for pair in zip(l, [f'{k}_{j+1}' for j, k in enumerate(l)]) for val in pair]

Output:

['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

edited May 13 '18 at 03:51

cs95

379,657
97
704
746

answered May 13 '18 at 02:49

user3483203

50,081
9
65
94

You could use a list comprehension instead of zip. Not sure which is faster though... – agtoever May 13 '18 at 08:35
3

If you look at the timings, this is faster than using a list comprehension. Much faster. – user3483203 May 13 '18 at 16:20

score 2 · Answer 4 · answered May 14 '18 at 15:44

Here's my simple implementation

l=['a','b','c']
# generate new list with the indices of the original list
new_list=l + ['{0}_{1}'.format(i, (l.index(i) + 1)) for i in l]
# sort the new list in ascending order
new_list.sort()
print new_list
# Should display ['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

score 0 · Answer 5 · answered May 13 '18 at 10:37

0

If you wanted to return [["a","a_1"],["b","b_2"],["c","c_3"]] you could write

new_l=[[x,"{}_{}".format(x,i+1)] for i,x in enumerate(l)]

This isn't what you want, instead you want ["a","a_1"]+["b","b_2"]+["c","c_3"]. This can be made from the result of the operation above using sum(); since you're summing lists you need to add the empty list as an argument to avoid an error. So that gives

new_l=sum(([x,"{}_{}".format(x,i+1)] for i,x in enumerate(l)),[])

I don't know how this compares speed-wise (probably not well), but I find it easier to understand what's going on than the other list-comprehension based answers.

answered May 13 '18 at 10:37

Especially Lime

121
2

@cᴏʟᴅsᴘᴇᴇᴅ How is it not what was asked? If `l==['a','b','c']` the result is `['a', 'a_1', 'b', 'b_2', 'c', 'c_3']` as required, and it avoids the use of a `for` loop. – Especially Lime May 13 '18 at 13:56
1

Eh sorry, didn't read past the first line. HOWEVER, calling sum() on a list is generally frowned upon, it's worse than a loop. – cs95 May 13 '18 at 14:17

score 0 · Answer 6 · answered May 20 '18 at 16:10

0

A very simple solution:

out_l=[]
for i,x in enumerate(l,1):
    out_l.extend([x,f"{x}_{i}"])

answered May 20 '18 at 16:10

kantal

2,331
2
8
15

Error - Syntactical Remorse · Answer 7 · 2019-06-13T16:07:46.213

Here is an easier list comprehension for this problem as well:

l = ['a', 'b', 'c']
print([ele for index, val in enumerate(l) for ele in (val, val + f'_{index + 1}')])

Output:

['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

Note this is just a simpler solution for interleaving the two lists. This is not a solution for multiple lists. The reason I use two for loops is because, at the time of writing, list comprehension does not support tuple unpacking.

Duplicate strings in a list and add integer suffixes to newly added ones

7 Answers7

`yield`

Sliced `list.setitem`

`zip` + `chain.from_iterable`

Performance

Functions

Linked

Related

Duplicate strings in a list and add integer suffixes to newly added ones

7 Answers7

yield

Sliced list.__setitem__

zip + chain.from_iterable

Performance

Functions

Linked

Related

`yield`

Sliced `list.setitem`

`zip` + `chain.from_iterable`