0

Working in Python 3.5.2 I have four lists of dates, each in ascending order, where the lists are not of equal length. Each list of dates is generated by a lookup into a longer list of dates. A sample date value and data type is shown below:

In: print (date, type(date))
Out: 725722.0 <class 'numpy.float64'>

I build each list of dates using a respective loop. To see the values I convert to strings and print each list. So I could sort with data type as numpy float64 or convert to string. Relevant values of actual data in each list (based on specific filter settings) are shown below:

a = [12-17-1987, 11-22-1989, 03-05-1990, 11-12-1990]
b = [12-16-1987, 03-02-1990, 11-12-1990]
c = [10-09-1986, 12-16-1987, 03-05-1990, 11-12-1990]
d = [10-16-1985, 08-20-1986, 10-15-1986, 12-16-1987, 03-02-1990]

I need to sort dates from all four lists in ascending order by mm-dd-yyyy, print each date, and beside each date print the name of the respective list, as shown in the example below:

# Desired Printout
10-16-1985  d
08-20-1986  d
10-09-1986  c
10-15-1986  d
12-16-1987  b
12-16-1987  c
12-16-1987  d
12-17-1987  a
11-22-1989  a
03-02-1990  b
03-02-1990  d
03-05-1990  a
03-05-1990  c
11-12-1990  a
11-12-1990  b
11-12-1990  c

This will give me visual confirmation of a sequence of events in four different sets of data. I would try to create a dictionary and sort by date for print to screen or disk but I have noticed similar answers using map or lambda functions that may provide a more elegant solution. If I am storing this information on disk what is the best data structure and solution?

SystemTheory
  • 339
  • 3
  • 15
  • I have read into your problem description and made some assumptions in my answer below. Feel free to comment if I've overlooked something or made a wrong assumption. – Taylor D. Edmiston Aug 22 '16 at 22:12
  • I accepted the solution by @tedmiston because my problem is solved by applying Approach 2 in my application. Thanks! – SystemTheory Aug 23 '16 at 19:46

4 Answers4

2

I have a couple comments on this one:

  1. "Best" is ambiguous. It could mean minimized algorithmic complexity, minimized runtime, minimized memory usage, simplest to implement or read, least amount of code, etc.

  2. Unless you have thousands of entries, it might not be worth optimizing your data structure or algorithm. The community's accepted best practice is to profile and optimize what's slow about your entire program.

A simple implementation could be nothing more than joining the lists and sorting them with the sorted built-in. For example, here are a few options you might consider for sorting:

import datetime

a = ['7-1-1987', '1-1-1990']
b = ['7-2-1987', '1-5-1990']
c = ['7-1-1987', '1-3-1990']
d = ['1-10-1985', '7-10-1986']

# hold on to list name
a = [(i, 'a') for i in a]  # [(date, list_name), ...]
b = [(i, 'b') for i in b]
c = [(i, 'c') for i in c]
d = [(i, 'd') for i in d]

dates = a + b + c + d  # combine into one flat list
for i in dates: print(i)

Output

('7-1-1987', 'a')
('1-1-1990', 'a')
('7-2-1987', 'b')
('1-5-1990', 'b')
('7-1-1987', 'c')
('1-3-1990', 'c')
('1-10-1985', 'd')
('7-10-1986', 'd')

Approach 1 - Parse each date string to a datetime object, sort them in place, and output a list of datetime objects.

dates_1 = [(datetime.datetime.strptime(d, '%m-%d-%Y').date(), l) for d, l in dates]
dates_1.sort()
for i in dates_1: print(i)

Output

(datetime.date(1985, 1, 10), 'd')
(datetime.date(1986, 7, 10), 'd')
(datetime.date(1987, 7, 1), 'a')
(datetime.date(1987, 7, 1), 'c')
(datetime.date(1987, 7, 2), 'b')
(datetime.date(1990, 1, 1), 'a')
(datetime.date(1990, 1, 3), 'c')
(datetime.date(1990, 1, 5), 'b')

Approach 2 - Sort the dates using a lambda function that parses them on the fly, and output a (new) list of strings.

dates_2 = sorted(dates, key=lambda d: (datetime.datetime.strptime(d[0], '%m-%d-%Y').date(), d[1]))
for i in dates_2: print(i)

Output

('1-10-1985', 'd')
('7-10-1986', 'd')
('7-1-1987', 'a')
('7-1-1987', 'c')
('7-2-1987', 'b')
('1-1-1990', 'a')
('1-3-1990', 'c')
('1-5-1990', 'b')

Approach 3 - Use heapq.merge to sort more efficiently. Credit to @friendlydog for the suggestion.

import datetime
import heapq

a = ['7-1-1987', '1-1-1990']
b = ['7-2-1987', '1-5-1990']
c = ['7-1-1987', '1-3-1990']
d = ['1-10-1985', '7-10-1986']

def strs_to_dates(date_strs, list_name):
    """
    Convert a list of date strings to a generator of (date, str) tuples.
    """
    return ((datetime.datetime.strptime(date, '%m-%d-%Y').date(), list_name) for date in date_strs)

a = strs_to_dates(a, 'a')
b = strs_to_dates(b, 'b')
c = strs_to_dates(c, 'c')
d = strs_to_dates(d, 'd')

dates_3 = heapq.merge(a, b, c, d)
for i in dates_3: print(i)

Output

(datetime.date(1985, 1, 10), 'd')
(datetime.date(1986, 7, 10), 'd')
(datetime.date(1987, 7, 1), 'a')
(datetime.date(1987, 7, 1), 'c')
(datetime.date(1987, 7, 2), 'b')
(datetime.date(1990, 1, 1), 'a')
(datetime.date(1990, 1, 3), 'c')
(datetime.date(1990, 1, 5), 'b')

Notes:

  1. I assumed the format of your input strings is 'day-month-year'.
  2. I assumed when the same date is in multiple lists, that you'd want to secondarily sort alphanumerically by list name.
  3. I left formatting the output list as an exercise for the reader.
  4. Both examples working under Python 2 / 3.

In this example, the key argument is a lambda. Without that it would sort the strings alphabetically. This lets us override that and sort by year > month > day.

A more elaborate implementation could take advantage of the guarantee that the lists are pre-sorted. Wikipedia has a list of merge algorithms to consider.

Taylor D. Edmiston
  • 12,088
  • 6
  • 56
  • 76
  • 1
    +1 for use of `datetime` module. And Python handily provides `heapq.merge` if you want to sort by merging. –  Aug 22 '16 at 22:13
  • @friendlydog Great point. There is definitely room to make this more efficient that way, with `itertools` directly, etc. I updated the answer by adding a third example that uses `heapq.merge(...)`. – Taylor D. Edmiston Aug 22 '16 at 22:38
  • Note my desired output requires both the dates in ascending order AND must indicate to the user the name of the list beside each date. So I need to generate date-name pairs and not just sorted dates. – SystemTheory Aug 22 '16 at 23:07
  • @SystemTheory Thanks, I just updated to include that. Luckily a simple change — I basically just used tuples in places of datetime scalars. This makes it really easier to sort by multiple criteria. – Taylor D. Edmiston Aug 23 '16 at 00:57
  • Applying Approach 2 solved my particular problem. Thanks for describing in detail several useful options. – SystemTheory Aug 23 '16 at 20:44
0

Assuming your dates are all formatted as mm-dd-yyyy (unlike your example), this should do the trick:

import itertools

lists = dict(a=['7-1-1987', '1-1-1990'],
             b=['7-2-1987', '1-5-1990'],
             c=['7-1-1987', '1-3-1990'],
             d=['1-10-1985', '7-10-1986'])

for d, v in sorted(itertools.chain(*([(e, n) for e in v] for n, v in lists.items()))):
    print d, v

If the dates aren't formatted properly, then you'd have to add a custom sorting key to the sorted function to parse the date into a properly comparable objects.

sirfz
  • 4,097
  • 23
  • 37
  • The code above causes Python 3.5.2 to throw the following error: AttributeError: 'dict' object has no attribute 'iteritems'. Also if I already have lists a, b, c, and d how do I construct the dictionary from the lists? – SystemTheory Aug 23 '16 at 00:39
  • @SystemTheory It's `dict.items` in Python 3. More info - http://stackoverflow.com/questions/10458437/what-is-the-difference-between-dict-items-and-dict-iteritems. – Taylor D. Edmiston Aug 23 '16 at 01:03
  • @SystemTheory if you provide more info about how you're constructing your lists then we can provide good ideas about constructing the dicts or mapping the list names differently. – sirfz Aug 23 '16 at 08:38
-1
#  Create the list of all dates, combining the four lists you have. Keep
#  the information about which list value comes from
all_dates = [(x, 'a') for x in a] + [(x, 'b') for x in b] + [(x, 'c') for x in c] + [(x, 'd') for x in d]

#  Sort with key a simple date parser. The way it works is:
#     1. It takes a date 11-12-2012 and splits it by '-' so that we get ['11', '12', '2012']
#     2. Reverses the list ([::-1]) so that the year is the most significant (['2012', '12', '11'])
#     3. Applies int to each so that they are compared as numbers ([2012, 12, 11]). Note that Python can automatically compare things like that
all_dates.sort(key = lambda x: list(map(int, x[0].split('-')[::-1])))

#  Print the result
for date in all_dates:
    print ' '.join(date)
Dmitry Torba
  • 3,004
  • 1
  • 14
  • 24
  • This answer solved my problem after fixing the print statement for use in Python 3.5.2: print (' '.join(date)). – SystemTheory Aug 23 '16 at 00:50
  • I approved of this answer too soon. After further testing some dates (mm-dd-yyyy) do not print in strictly ascending order. Obvious errors show october and november before june, july, august, or september in a given year. – SystemTheory Aug 23 '16 at 01:52
-2

You honestly don't need anything that fancy. Just do a min on the first item in every list. Then check if the value that is the min is in any of the lists and do a list.pop() and a print then. That's a simple way to do it that is efficient and makes sense. I could provide you the code but this should be clear enough.

bravosierra99
  • 1,331
  • 11
  • 23
  • 1
    I may try this approach using list.index() instead of list.pop() although I don't immediately grasp the necessary logic in the control structure. – SystemTheory Aug 22 '16 at 23:18
  • sure, best of luck! I would be surprised if you can't get it to work this way. And I bet you find that your code is much more readable than doing a reduce/lambda version. Sometimes that stuff is good, but a lot of the time you can write code that's just about as fast and it's much easier to read, debug, and make correct. – bravosierra99 Aug 23 '16 at 15:02