1

I'm interested in finding the largest set in an unordered list of months, that can be returned as an ordered list of distinct, consecutive months.

For example:

consecutive_months(["December", "January", "February", "April"])

Would output:

"December", "January", "February"

And:

consecutive_months(["February", "December", "January"])

Would output:

"December", "January", "February"

The following works but I'm curious if anyone has ideas for a more elegant way:

MONTHS = ["January", "February", "March", 
          "April", "May", "June",
          "July", "August", "September", 
          "October", "November", "December"] 

def consecutive_months(lst_of_months):
    # create two years of months to handle going from Dec to Jan        
    results = []
    for m in set(lst_of_months):
        results.append((m,MONTHS.index(m)))
        results.append((m,MONTHS.index(m)+12))            
    results = sorted(results, key=lambda x: x[1])
    
    # find the longest series of consecutive months
    this_series = []    
    longest_series = []
    for this_month in results:
        if len(this_series) > 0:
            last_month = this_series[-1]
            if last_month[1] + 1 == this_month[1]:
                this_series.append(this_month)
            else:
                this_series = [this_month]
        else:
            this_series = [this_month]           
        if len(this_series) > len(longest_series):
            longest_series = [m for (m,i) in this_series]

    return longest_series

Here is a pastebin with sample inputs and expected outputs.

Carl Cervone
  • 1,103
  • 8
  • 10
  • One of the ways is to replace all the month-names with corresponding numbers (`January = 1`, `February = 2` and so on...)and then finding the consecutive numbers – CoolCoder May 21 '21 at 03:28
  • Yes, that's part of how I implemented my solution, but in order to capture Dec/Jan as being consecutive I had to duplicate the months so it looks like this (January = 1, December = 12, January = 13, December = 24). – Carl Cervone May 21 '21 at 03:36
  • Not in the real dataset I'm working with, but I've edited my post and applied a `set()` function on the list to ensure this case could be handled properly. – Carl Cervone May 21 '21 at 04:03
  • Is it not possible to have a result that has more than 12 entries, like ["January", "February", ....etc..., "December", "January"]? – trincot May 21 '21 at 17:53
  • @python_user good catch! My module would only return the first consecutive series -- but I like how your approach is intended to give both series. – Carl Cervone May 22 '21 at 01:37
  • @trincot the data input I'm working with is constrained so each month can only appear once. However, based on the suggestion from python_user, I edited my OP so that the list is converted to a set to prevent issues related to duplicate months. – Carl Cervone May 22 '21 at 01:39
  • I note that your code produces a list of (string, int) tuples, while your question describes a list of strings. Which of the two do you really want? – trincot May 22 '21 at 07:58
  • 1
    Either is fine. But I've edited my code so it only returns a list of month name strings. – Carl Cervone May 22 '21 at 17:49

3 Answers3

2

I noted one issue with your code: when all 12 months appear in the input, then the output lists all months twice. This is easy to fix, just do:

return longest_series[:12]

I would go for a solution where the input is translated to a kind of "bitmap" of 12 bits, where a 1 indicates the corresponding month is in the input, while a 0 indicates it is not.

If represented as a string of 12 characters, you can use a regular expression to easily identify the sequences of "1".

I would also do some preprocessing of the months list, so you both have the list and the dictionary version of it, and have the list doubled, so you can slice from it across the 12-boundary.

Here is the suggested code:

import re

months = ["January", "February", "March", 
          "April", "May", "June",
          "July", "August", "September", 
          "October", "November", "December"]

# Also create a dictionary to get a month's index
month_nums = { month: num for num, month in enumerate(months) }
# ... and double the months list, to ease slicing across the 12-boundary
months += months

def consecutive_months(given_months):
    # Deal with boundary case
    if not given_months:
        return []

    # Convert input to 12 bits in string format
    lst = ["0"] * 12
    for m in given_months:
        lst[month_nums[m]] = "1"
    bits = "".join(lst)
    
    # Identify the longest chunk of consecutive "1" in that doubled string
    _, start, end = max((j-i, i, j) 
        for i, j in (match.span(0)
            for match in re.finditer("1+", bits + bits)
        )
    )
 
    # Using the found span, extract the corresponding month names
    return months[start:end][:12]
trincot
  • 317,000
  • 35
  • 244
  • 286
  • great solution! This produces the expected output for all my test data and seems reasonably efficient too. – Carl Cervone May 22 '21 at 18:13
  • @CarlCervone Well, case No. 42 of your [test data](https://pastebin.com/K3jJYCyE) test failed, when there is multiple series of the same length, this code returns the last one, so which one return depends on your needs. – bruce May 23 '21 at 12:44
  • The OP already said above that this produces the expected output. It seems acceptable to them to get one of the possible sequences when there are multiple sequences of equal length to choose from. – trincot May 24 '21 at 07:33
1

The month string is just a symbol, its essence is still the corresponding number behind it, from 1 to 12, month after month.

Strings for two month cannot be compared directly. If you convert them to numbers, the next month's number can be calculated by adding 1 (except for January after December), and the comparison between numbers is certainly greater than string.

My optimized code is as follows:

MONTHS = ["January", "February", "March",
          "April", "May", "June",
          "July", "August", "September",
          "October", "November", "December"]

month_num_dict = {month: num for num, month in enumerate(MONTHS, start=1)}


def consecutive_months(month_list: list) -> list:
    # Deal with boundary case
    if len(month_list) == 0:
        return month_list

    # A list of twice length is required only when the first and last months end to end
    first_month_num = month_num_dict[month_list[0]]
    last_month_num = month_num_dict[month_list[-1]]
    last_month_next_num = last_month_num + 1 if last_month_num != 12 else 1

    month_list = month_list * 2 if last_month_next_num == first_month_num else month_list

    # Initialize list of candidates and longest series
    candidate = [month_list[0], ]
    longest_series = [month_list[0], ]

    for i in range(len(month_list) - 1):
        month = month_list[i]
        month_num = month_num_dict[month]

        next_month = month_list[i + 1]
        next_month_num = month_num_dict[next_month]

        expected_next_month_num = month_num + 1 if month_num != 12 else 1

        if expected_next_month_num == next_month_num:
            candidate.append(next_month)

            # At the end of the traversal, decide whether to update the longest series
            # according to the length of the candidate.
            if i == len(month_list) - 2 and len(candidate) > len(longest_series):
                longest_series = candidate
        else:
            # When the length of the new candidate is greater than the old, update the longest series
            if len(candidate) > len(longest_series):
                longest_series = candidate

            # Generate next candidate month list
            candidate = [next_month, ]
    
    # Deal with all 12 months input list
    if len(longest_series) > 12:
        return MONTHS

    return longest_series

If you are worried that the handwritten MONTHS list may be wrong, you can also get them by time.strftime:

import time
import locale

locale.setlocale(locale.LC_ALL, "en_US.UTF-8")

month_num_dict = {
    time.strftime("%B", time.strptime(str(num), "%m")): num
    for num in range(1, 13)
}

MONTHS = list(month_num_dict.keys())

Of course, in order to set back to the original locale and ensure thread safety, you can add a thread mutex, and the code can refer to this answer, my complete code contains all the test data, as you can see here.

bruce
  • 462
  • 6
  • 9
  • It is strange that you call the use of a common library as `re` "lazy". Maybe using Python is also lazy? ;-) – trincot May 23 '21 at 16:48
  • Sorry, I'm not good at English, translation software expression of words may make you misunderstood. I think all algorithms that can be output correctly are good algorithms, including yours. Considering the efficiency problem, I personally think that it is more appropriate to deal with the underlying algorithm by self for this example. – bruce May 23 '21 at 17:54
  • Now you edited it, and call my solution "overkill"? What is the "extra work" of this regular expression that you refer to? It scans each character in a string, of exactly 12 characters, exactly once. – trincot May 24 '21 at 07:37
  • There is an idiom in the Chinese called "杀鸡焉用牛刀" literally means "kill chickens with a cow knife", I really can't find the right English words. The extra work I'm talking about is calling the internal mechanism of the regular engine, which is indeed twice as slow as the actual performance comparison. – bruce May 24 '21 at 07:56
  • What do you mean with "as the actual performance comparison"? Any benchmark you can refer to? – trincot May 24 '21 at 08:03
  • I just tested all use cases multiple times using `time.perf_counter()`. – bruce May 24 '21 at 08:11
  • Indeed, for the given test cases your solution is fast, but it gives wrong results for some inputs. Like `["December", "February", "January"]` should return `["December", "January", "February"]`, but your function just returns `["December"]`. Also, it becomes slower when the input has relatively more months in the input, like `["January", "February", "March","April", "May", "June","July", "September","October", "November", "December"]` – trincot May 24 '21 at 09:30
  • My understanding of the problem may be biased, after December does not have to be January is considered continuous, unless the input in the opposite direction of the month is also continuous? Long lists are really not as fast as RE method, as the length gets longer, iter code in pure python gets slower. After all, the backend of the regular engine is the C language. This is what I didn't notice when I tested, and the description of the performance aspect has been deleted. Apologize again if there is an offensive part of the previous description. – bruce May 24 '21 at 12:49
1

Here are two working approaches from a friend who also looked at the problem. The first is performant and uses the modulo operator so that the list doesn't need to copied onto itself.

month_names = [
    'January', 'February',
    'March', 'April', 'May',
    'June', 'July', 'August',
    'September', 'October',
    'November', 'December'
]
​
# Looks like: {'January': 0, 'February': 1...}
month_name_to_index = {
  value: index
  for index, value
  in enumerate(month_names)
}
​
​
def consecutive_months(list_of_months_by_name):
  if not list_of_months_by_name:
    # If the list is empty, return None.
    return None
​
  month_was_seen = [False] * 12  # Looks like: [False, False, ...]
  
  for month_name in list_of_months_by_name: 
    month_was_seen[month_name_to_index[month_name]] = True
​
  # Seek to first missing month:
  for start_index in range(12):
    if not month_was_seen[start_index]:
      break
​
  # If there is no missing month, return the whole year.
  if month_was_seen[start_index]:
    return {"from": "January", "to": "December", "length": 12}
​
  # Starting from the first missing month, loop around the year
  # and keep track of the longest run using one boolean and four
  # integers.
  running = False
  longest_run_index = None
  longest_run_length = 0
  current_run_index = None
  current_run_length = None
  for offset in range(1, 13):
    index = (start_index + offset) % 12
    if month_was_seen[index]:
      # Continue a run or begin a run.
      if running:
        current_run_length += 1
        continue
      running = True
      current_run_index = index 
      current_run_length = 1
      continue
    if running:
      # End the run.
      running = False
      if current_run_length > longest_run_length:
        longest_run_index = current_run_index 
        longest_run_length = current_run_length
​
  return {
    "from": month_names[longest_run_index],
    "to": month_names[(longest_run_index + longest_run_length - 1) % 12],
    "length": longest_run_length
  }

The second is a clever one-liner:

MONTH_NAMES = [
    'January', 'February',
    'March', 'April', 'May',
    'June', 'July', 'August',
    'September', 'October',
    'November', 'December'
]
​
def consecutive_months(list_of_months_by_name):
  return max(
    (
      len(segment)-segment.index(":")-1,
      (MONTH_NAMES*2)[
          int(segment[:segment.index(":")])+1
          :
          int(segment[:segment.index(":")]) + len(segment) - segment.index(":")
      ]
    )
    for segment in 
    "".join([
      "x" if month_name in list_of_months_by_name else f",{index}:"
      for index, month_name in enumerate(MONTH_NAMES*2)
    ]).split(",")
    if ":" in segment
  )[1] if set(MONTH_NAMES) - set(list_of_months_by_name) else MONTH_NAMES

Both algorithms return the expected results for the test data. Thanks AV!

Carl Cervone
  • 1,103
  • 8
  • 10
  • I thought your question was about finding a more elegant way. I have trouble seeing how these are more elegant than your original code. I'm a bit disappointed I must admit. – trincot May 24 '21 at 13:05