-1

If I have the list

a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']

I want to be able to sort this based on what each element starts on. So the output I want is:

a = [['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]

But the thing is, for my real code, I won't know have many strings starts with a '1' or with a '2', so therefore I can't divide the list based on a fixed value, is there a way of comparing each element and combine them if they're the same?

Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • 1
    Please post your attempt to work it out, so we can provide feedback. – entreprenerds Nov 28 '19 at 02:27
  • _I want to be able to sort this based on what each element starts on._ Wouldn't it be more accurate to say you want to **group** the values based on the first character, not sort? – AMC Nov 28 '19 at 03:13

2 Answers2

4

You can use itertools.groupby() combined with a list comprehension:

>>> import itertools
>>> a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']
>>> [list(x[1]) for x in itertools.groupby(a, lambda i: i.split(" ")[0])]
[['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]

Note that .groupby() requires the iterable (i.e. a) to be sorted, so you may have to sort it first if your real data looks different.

Selcuk
  • 57,004
  • 12
  • 102
  • 110
  • 1
    This would have been my answer as well, although I understood the question to be about the first character, not the first word, in which case the solution would be: `[list(v) for k, v in groupby(a, lambda x: x[0])]` – Grismar Nov 28 '19 at 01:18
  • 1
    @Grismar Yes, that part is open to interpretation. I assumed that there might also be sublists that start with `10`, for example. – Selcuk Nov 28 '19 at 01:20
  • 1
    A matter of style: `[list(x[1]) for x in ...]` vs. `[list(x) for _, x in ..]` - I think the second is more Pythonic, as it makes clear you're taking the second half of a 2-tuple. – Grismar Nov 28 '19 at 01:23
  • pretty nice solution – Han.Oliver Nov 28 '19 at 02:09
-1

This is not an efficient algorithm but you could do:

a = ['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc', '2 5 6 8', '2 7 3 9', '2 etc etc']

already_sorted = []
new_a = []

for i in range(0, len(a)):
    if i in already_sorted:
        continue
    else:
        tmp = []
        for j in range(i, len(a)):

            if j not in already_sorted and a[i].split(' ')[0] == a[j].split(' ')[0]:
                tmp.append(a[j])
                already_sorted.append(j)

        new_a.append(tmp)

print(new_a)

Output:

[['1 2 3 4 5', '1 2 3 4 etc', '1 etc etc'], ['2 5 6 8', '2 7 3 9', '2 etc etc']]
Luismi98
  • 282
  • 3
  • 14
  • 1
    Note that this algorithm has a best-case time complexity of `O(n^2)`. You could at least change the second loop to check the rest of the items, instead of iterating them all. Also `is not 0` is not the correct way to do it (it only works because of a CPython optimisation. See https://stackoverflow.com/questions/306313/is-operator-behaves-unexpectedly-with-integers for details). – Selcuk Nov 28 '19 at 02:20
  • 1
    Thank you for the detail about the use of ```if not 0```. I have optimised it slightly to reduce computational time if the dataset was very large. – Luismi98 Nov 28 '19 at 02:38
  • thank you so much! this was exactly what I was looking for!! – Daniel Bouvin Nov 28 '19 at 08:25