2

I have a list of all US state names.

states = ['Oklahoma', 'Kansas', 'North Carolina', 'Georgia', 'Oregon',
      'Mississippi', 'Minnesota', 'Colorado', 'Alabama',
      'Massachusetts', 'Arizona', 'Connecticut', 'Montana',
      'West Virginia', 'Nebraska', 'New York', 'Nevada', 'Idaho',
      'New Jersey', 'Missouri', 'South Carolina', 'Pennsylvania',
      'Rhode Island', 'New Mexico', 'Alaska', 'New Hampshire',
      'Tennessee', 'Washington', 'Indiana', 'Hawaii', 'Kentucky',
      'Virginia', 'Ohio', 'Wisconsin', 'Maryland', 'Florida',
      'Utah', 'Maine', 'California', 'Vermont', 'Arkansas', 'Wyoming',
      'Louisiana', 'North Dakota', 'South Dakota', 'Texas',
      'Illinois', 'Iowa', 'Michigan', 'Delaware']

I want to find the longest string in this list of items, which is easy enough with the following:

def longest_state(data):
    return(max(states,key=len))
print(longest_state(states)

This returns "North Carolina", which has a length of 14. However, "South Carolina" is also 14, but is not returned.

I tried to use the following stackoverflow thread which has an example to find multiple longest strings using a list comprehension but I was unable to make it work... Python's most efficient way to choose longest string in list?

I also tried using if/else statements to append the list item to another variable if it equaled the length of the current longest item but was unsuccessful

Can anyone help?

Erich Purpur
  • 1,337
  • 3
  • 14
  • 34
  • "but I was unable to make it work" How? How did you try? What error/output did you get? – DeepSpace Sep 10 '18 at 19:20
  • `longest = [state from states if len(state) = 14]` should eleminate all that are shorter. Do not recompute the longest-length inside the list comp, it will recompute the longest one for each element. – Patrick Artner Sep 10 '18 at 19:24

7 Answers7

3

You could store all of the longest names in an array

def longest_state(data):
    cur_longest = []
    cur_longest_num = 0
    for state in data:
        if len(state) == cur_longest_num:
            cur_longest.append(state)
        elif len(state) > cur_longest_num:
            cur_longest = [state]
            cur_longest_num = len(state)
    return cur_longest
Spencer Bard
  • 1,015
  • 6
  • 10
  • @DeepSpace This is hardly "complicated" and it has an advantage over the other solutions so far in that it requires only a single scan over the input. – Patrick Haugh Sep 10 '18 at 19:27
  • @PatrickHaugh There are other approaches here that only iterate the input once. – wim Sep 10 '18 at 19:30
  • Yeah I was looking to only iterate over the list once. I also thought the verbosity of this answer would help the person asking the question understand how it works. – Spencer Bard Sep 10 '18 at 19:33
  • I like this solution @SpencerBard. Can you explain a little more, why does cur_longest = [state] overwrite the value(s) currently stores in the cur_longest variable? – Erich Purpur Sep 13 '18 at 14:08
  • 2
    @ErichPurpur It doesn't. `cur_longest[:] = [state]` would do that. This just rebinds the name (which, in this context, has the same effect) – wim Sep 13 '18 at 14:41
2

Hope this helps. Two pass approach, may not be the best. But certainly is O(n).

def longest_state(states):
    max_len = len(max(states, key=len))
    return [state for state in states if len(state) == max_len]

1 pass would be best, but this looks shorter.

1
s = len(max(states, key=len))
[i for i in states if len(i) == s]
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Jonas Wolff
  • 2,034
  • 1
  • 12
  • 17
1

Key a dict off of the lengths:

>>> from collections import defaultdict
>>> len2states = defaultdict(list)
>>> for state in states:
...     len2states[len(state)].append(state)
...     
>>> len2states[max(len2states)]
['North Carolina', 'South Carolina']
wim
  • 338,267
  • 99
  • 616
  • 750
1

This question got me wondering which of all the possible solutions would have the best performance, so i made a comparison between all of which came to my mind and were not posted already, and compared them to mine.

The groupby approach:

sorted_states = sorted(states, key=len, reverse=True)
grouped_states = next(groupby(sorted_states, key=len))
list(grouped_states[1])

groupby needs a sorted list to work properly, so there is the "overhead" of sorting the list beforehand, but most Python interpreters have heavily optimized sorting algorhythms. We stop the generator at the first group occurrence with next, so it does not continue fetching the remaining items.

The takewhile approach:

sorted_states = sorted(states, key=len, reverse=True)
max_length = len(sorted_states[0])
list(takewhile(lambda x: max_length == len(x), sorted_states))

This also needs a sorted list, as well the length of the first item, but it stops the gathering of the new list as soon as the lambda expectation is not met anymore.

The reduce approach:

def _keep_longest(a, v):
  if len(a) == 0 or len(v) >= len(a[-1]):
    a.append(v)
  return a

sorted_states = sorted(states, key=len, reverse=True)
reduce(_keep_longest, sorted_states, [])

This one needs a method to handle the comparison between the previous length as well a sorted list. All of its length comparisons and moving list from lambda to lambda make this a non-efficient method.

Other answers from this question

I included the other answers (The max and len from various posters, as well @Spencer Bard's, @Wim's and the other list comprehensions doing the len on the max scan over each comparison) in the tests as well to compare their performance

Results

Of course the results vary a lot but doing them over and over again (sample size 50_000 on repl.it) i can say that they are representative (let them also run a few times on my cPython 3.5):

max and len 50_000 times: 1.3888958770003228
sort and groupby 50_000 times: 1.405984859000455
sort and takewhile 50_000 times: 1.4154430249991492
spencer 50_000 times: 1.607105290000618
wim 50_000 times: 1.9011182049998752
sort and reduce 50_000 times: 4.612968634999561
comprehension 50_000 times: 27.802522705999763

Conclusion

The max and len approach posted multiple times here takes the cake, and is probably the most pythonic way as it is self-explanatory without resorting to list sorting or the usage of the itertools, functools or collections libraries.

Online demo here

wiesion
  • 2,349
  • 12
  • 21
0
longest_state = max(states, key=len)

for i in states:
    if len(i) == len(longest_state):
        print(i)

Alternate format

longest_state = max(states, key=len)

[[print(i)] for i in states if len(i) == len(longest_state)]
vash_the_stampede
  • 4,590
  • 1
  • 8
  • 20
-1

Another potential solution. Pretty short and sweet

def longest_state(data):
    return [state for state in data if len(state) == len(max(data, key=len))]
Murman
  • 1
  • 2