More than 1 longest string in list?

Question

I have a list of all US state names.

states = ['Oklahoma', 'Kansas', 'North Carolina', 'Georgia', 'Oregon',
      'Mississippi', 'Minnesota', 'Colorado', 'Alabama',
      'Massachusetts', 'Arizona', 'Connecticut', 'Montana',
      'West Virginia', 'Nebraska', 'New York', 'Nevada', 'Idaho',
      'New Jersey', 'Missouri', 'South Carolina', 'Pennsylvania',
      'Rhode Island', 'New Mexico', 'Alaska', 'New Hampshire',
      'Tennessee', 'Washington', 'Indiana', 'Hawaii', 'Kentucky',
      'Virginia', 'Ohio', 'Wisconsin', 'Maryland', 'Florida',
      'Utah', 'Maine', 'California', 'Vermont', 'Arkansas', 'Wyoming',
      'Louisiana', 'North Dakota', 'South Dakota', 'Texas',
      'Illinois', 'Iowa', 'Michigan', 'Delaware']

I want to find the longest string in this list of items, which is easy enough with the following:

def longest_state(data):
    return(max(states,key=len))
print(longest_state(states)

This returns "North Carolina", which has a length of 14. However, "South Carolina" is also 14, but is not returned.

I tried to use the following stackoverflow thread which has an example to find multiple longest strings using a list comprehension but I was unable to make it work... Python's most efficient way to choose longest string in list?

I also tried using if/else statements to append the list item to another variable if it equaled the length of the current longest item but was unsuccessful

Can anyone help?

"but I was unable to make it work" How? How did you try? What error/output did you get? — DeepSpace, Sep 10 '18 at 19:20
`longest = [state from states if len(state) = 14]` should eleminate all that are shorter. Do not recompute the longest-length inside the list comp, it will recompute the longest one for each element. — Patrick Artner, Sep 10 '18 at 19:24

score 3 · Accepted Answer · answered Sep 10 '18 at 19:21

3

You could store all of the longest names in an array

def longest_state(data):
    cur_longest = []
    cur_longest_num = 0
    for state in data:
        if len(state) == cur_longest_num:
            cur_longest.append(state)
        elif len(state) > cur_longest_num:
            cur_longest = [state]
            cur_longest_num = len(state)
    return cur_longest

answered Sep 10 '18 at 19:21

Spencer Bard

1,015
6
10

@DeepSpace This is hardly "complicated" and it has an advantage over the other solutions so far in that it requires only a single scan over the input. – Patrick Haugh Sep 10 '18 at 19:27
@PatrickHaugh There are other approaches here that only iterate the input once. – wim Sep 10 '18 at 19:30
Yeah I was looking to only iterate over the list once. I also thought the verbosity of this answer would help the person asking the question understand how it works. – Spencer Bard Sep 10 '18 at 19:33
I like this solution @SpencerBard. Can you explain a little more, why does cur_longest = [state] overwrite the value(s) currently stores in the cur_longest variable? – Erich Purpur Sep 13 '18 at 14:08
2

@ErichPurpur It doesn't. `cur_longest[:] = [state]` would do that. This just rebinds the name (which, in this context, has the same effect) – wim Sep 13 '18 at 14:41

score 2 · Answer 2 · answered Sep 10 '18 at 19:29

Hope this helps. Two pass approach, may not be the best. But certainly is O(n).

def longest_state(states):
    max_len = len(max(states, key=len))
    return [state for state in states if len(state) == max_len]

1 pass would be best, but this looks shorter.

score 1 · Answer 3 · edited Sep 10 '18 at 19:28

1

s = len(max(states, key=len))
[i for i in states if len(i) == s]

edited Sep 10 '18 at 19:28

ShadowRanger

143,180
12
188
271

answered Sep 10 '18 at 19:23

Jonas Wolff

2,034
1
12
17

score 1 · Answer 4 · answered Sep 10 '18 at 19:23

Key a dict off of the lengths:

>>> from collections import defaultdict
>>> len2states = defaultdict(list)
>>> for state in states:
...     len2states[len(state)].append(state)
...     
>>> len2states[max(len2states)]
['North Carolina', 'South Carolina']

wiesion · Answer 5 · 2018-09-11T08:53:37.680

This question got me wondering which of all the possible solutions would have the best performance, so i made a comparison between all of which came to my mind and were not posted already, and compared them to mine.

The `groupby` approach:

sorted_states = sorted(states, key=len, reverse=True)
grouped_states = next(groupby(sorted_states, key=len))
list(grouped_states[1])

groupby needs a sorted list to work properly, so there is the "overhead" of sorting the list beforehand, but most Python interpreters have heavily optimized sorting algorhythms. We stop the generator at the first group occurrence with next, so it does not continue fetching the remaining items.

The `takewhile` approach:

sorted_states = sorted(states, key=len, reverse=True)
max_length = len(sorted_states[0])
list(takewhile(lambda x: max_length == len(x), sorted_states))

This also needs a sorted list, as well the length of the first item, but it stops the gathering of the new list as soon as the lambda expectation is not met anymore.

The `reduce` approach:

def _keep_longest(a, v):
  if len(a) == 0 or len(v) >= len(a[-1]):
    a.append(v)
  return a

sorted_states = sorted(states, key=len, reverse=True)
reduce(_keep_longest, sorted_states, [])

This one needs a method to handle the comparison between the previous length as well a sorted list. All of its length comparisons and moving list from lambda to lambda make this a non-efficient method.

Results

Of course the results vary a lot but doing them over and over again (sample size 50_000 on repl.it) i can say that they are representative (let them also run a few times on my cPython 3.5):

max and len 50_000 times: 1.3888958770003228
sort and groupby 50_000 times: 1.405984859000455
sort and takewhile 50_000 times: 1.4154430249991492
spencer 50_000 times: 1.607105290000618
wim 50_000 times: 1.9011182049998752
sort and reduce 50_000 times: 4.612968634999561
comprehension 50_000 times: 27.802522705999763

Conclusion

The max and len approach posted multiple times here takes the cake, and is probably the most pythonic way as it is self-explanatory without resorting to list sorting or the usage of the itertools, functools or collections libraries.

Online demo here

score 0 · Answer 6 · answered Sep 10 '18 at 19:46

0

longest_state = max(states, key=len)

for i in states:
    if len(i) == len(longest_state):
        print(i)

Alternate format

longest_state = max(states, key=len)

[[print(i)] for i in states if len(i) == len(longest_state)]

answered Sep 10 '18 at 19:46

vash_the_stampede

4,590
1
8
20

score -1 · Answer 7 · answered Sep 10 '18 at 20:20

-1

Another potential solution. Pretty short and sweet

def longest_state(data):
    return [state for state in data if len(state) == len(max(data, key=len))]

answered Sep 10 '18 at 20:20

Murman

1
2

It might be looking clean and shorter but you do realize it will calculate max at each iteration – mad_ Sep 10 '18 at 20:59
Fair. Thanks for the feedback – Murman Sep 12 '18 at 03:34

More than 1 longest string in list?

7 Answers7

The `groupby` approach:

The `takewhile` approach:

The `reduce` approach:

Other answers from this question

Results

Conclusion

More than 1 longest string in list?

7 Answers7

The groupby approach:

The takewhile approach:

The reduce approach:

Other answers from this question

Results

Conclusion

The `groupby` approach:

The `takewhile` approach:

The `reduce` approach: