Python create new lists within a list based on the index within a list

Question

If I have the list

a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']

I want to be able to sort this based on what each element starts on. So the output I want is:

a = [['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]

But the thing is, for my real code, I won't know have many strings starts with a '1' or with a '2', so therefore I can't divide the list based on a fixed value, is there a way of comparing each element and combine them if they're the same?

Please post your attempt to work it out, so we can provide feedback. — entreprenerds, Nov 28 '19 at 02:27
_I want to be able to sort this based on what each element starts on._ Wouldn't it be more accurate to say you want to **group** the values based on the first character, not sort? — AMC, Nov 28 '19 at 03:13

score 4 · Answer 1 · answered Nov 28 '19 at 01:15

4

You can use itertools.groupby() combined with a list comprehension:

>>> import itertools
>>> a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']
>>> [list(x[1]) for x in itertools.groupby(a, lambda i: i.split(" ")[0])]
[['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]

Note that .groupby() requires the iterable (i.e. a) to be sorted, so you may have to sort it first if your real data looks different.

answered Nov 28 '19 at 01:15

Selcuk

57,004
12
102
110

1

This would have been my answer as well, although I understood the question to be about the first character, not the first word, in which case the solution would be: `[list(v) for k, v in groupby(a, lambda x: x[0])]` – Grismar Nov 28 '19 at 01:18
1

@Grismar Yes, that part is open to interpretation. I assumed that there might also be sublists that start with `10`, for example. – Selcuk Nov 28 '19 at 01:20
1

A matter of style: `[list(x[1]) for x in ...]` vs. `[list(x) for _, x in ..]` - I think the second is more Pythonic, as it makes clear you're taking the second half of a 2-tuple. – Grismar Nov 28 '19 at 01:23
pretty nice solution – Han.Oliver Nov 28 '19 at 02:09

Luismi98 · Accepted Answer · 2021-10-04T01:00:09.110

-1

This is not an efficient algorithm but you could do:

a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']

already_sorted = []
new_a = []

for i in range(0, len(a)):
    if i in already_sorted:
        continue
    else:
        tmp = []
        for j in range(i, len(a)):

            if j not in already_sorted and a[i].split(' ')[0] == a[j].split(' ')[0]:
                tmp.append(a[j])
                already_sorted.append(j)

        new_a.append(tmp)

print(new_a)

Output:

[['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]

edited Oct 04 '21 at 01:00

answered Nov 28 '19 at 01:54

Luismi98

282
3
14

1

Note that this algorithm has a best-case time complexity of `O(n^2)`. You could at least change the second loop to check the rest of the items, instead of iterating them all. Also `is not 0` is not the correct way to do it (it only works because of a CPython optimisation. See https://stackoverflow.com/questions/306313/is-operator-behaves-unexpectedly-with-integers for details). – Selcuk Nov 28 '19 at 02:20
1

Thank you for the detail about the use of ```if not 0```. I have optimised it slightly to reduce computational time if the dataset was very large. – Luismi98 Nov 28 '19 at 02:38
thank you so much! this was exactly what I was looking for!! – Daniel Bouvin Nov 28 '19 at 08:25

Python create new lists within a list based on the index within a list

2 Answers2