2

So, I have a list like following

potential_labels = ['foo', 'foo::bar', 'foo::bar::baz', "abc", "abc::cde::def", "bleh"]

The desired_output = ['foo::bar::baz', "abc::cde::def", "bleh"]

This is because.. for root "foo", 'foo::bar::baz' is the longest sequence for "abc", "abc::cde::def", and for "bleh" it "bleh"

Is there any python inbuilt function which does this.. I feel like there is almost something in itertools which does this but cant seem to figure this out.

frazman
  • 32,081
  • 75
  • 184
  • 269

2 Answers2

3

Option 1
max + groupby should do it.

r = [max(g, key=len) for _, g in \
          itertools.groupby(data, key=lambda x: x.split('::')[0])]

r
['foo::bar::baz', 'abc::cde::def', 'bleh']

Option 2
A much simpler solution would involve the collections.OrderedDict:

from collections import OrderedDict

o = OrderedDict()    
for x in data:
    o.setdefault(x.split('::')[0], []).append(x)

r = [sorted(o[k], key=len)[-1] for k in o]

r
['foo::bar::baz', 'abc::cde::def', 'bleh']

Not exactly a one liner, but what is pythonic is subjective after all.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • It is only inefficient by a log(n) factor; you need to traverse the list anyway, probably twice to (1) get the maxlen, and (2) to extract the values. – Reblochon Masque Oct 25 '17 at 09:47
  • @ReblochonMasque Thanks, that was informative. I can think of doing this with a loop and dict, that would probably speed things up a bit. – cs95 Oct 25 '17 at 09:48
  • 1
    The OP sort of asked for a pythonic way, & your answer delivers. – Reblochon Masque Oct 25 '17 at 09:51
  • @cᴏʟᴅsᴘᴇᴇᴅ I think you can use `max` instead of `sorted`, e.g. `[max(list(g), key=len) for ...]` – pylang Oct 25 '17 at 12:19
  • 1
    @pylang Yes, absolutely. Wonder why I didn’t see that. Thank you. – cs95 Oct 25 '17 at 12:48
1

You can do a simple list comprehension taking advantage of a condition:

>>> [label for label in potential_labels if "\0".join(potential_labels).count("\0{}".format(label))==1]
['foo::bar::baz', 'abc::cde::def', 'bleh']
Ivan De Paz Centeno
  • 3,595
  • 1
  • 18
  • 20
  • This doesnt work if `potential_labels=[u'Reggae', u'Reggae::Dancehall', u'Reggae::Reggae-Pop', u'Reggae::Contemporary Reggae', u'Reggae::Ragga', u'Reggae', u'Reggae::Dancehall', u'Reggae::Reggae-Pop', u'Reggae::Contemporary Reggae', u'Reggae::Ragga']` – frazman Oct 29 '17 at 08:58