2

I have a dictionary with strings as keys formatted as yyyy-mm-dd and want to sort the dictionary by keys with the earliest dates first:

I am currently using sorted(datesAndText.keys()) but this isn't reliably working because the month and day fields are not always zero padded.

I have looked at Sort python dictionary by date keys and How do I sort this list in Python, if my date is in a String? but I can't seem to adopt them to by specific case.

Community
  • 1
  • 1
zpesk
  • 4,343
  • 7
  • 39
  • 61

3 Answers3

8

Are you sure your keys are exactly in the format yyyy-mm-dd? For example:

>>> '2010-1-15' < '2010-02-15'
False

You may be forced to sort something like this:

sorted(d,key=lambda x: [int(y) for y in x.split('-')])

Another solution (assuming your years are all 4 digits):

sorted(d,key=lambda x: [y.zfill(2) for y in x.split('-')]) 

I'm not sure which would be faster. I suppose it's a candidate for timeit.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • `zfill`: now there's a method we don't see enough of. – Daniel Roseman Oct 10 '12 at 14:03
  • @DanielRoseman -- Yeah. I actually was thinking it was called `zpad` until I tried it in the interpreter and was reminded that wasn't actually a method name ... It's better than `'0'*(2-len(s))+s` though. – mgilson Oct 10 '12 at 14:07
2

Dates in yyyy-mm-dd format sort the same way both alphabetically and chronologically, so you can use standard sorted:

for k, v in sorted(datesAndText.items()):
    # do something with key and value
eumiro
  • 207,213
  • 34
  • 299
  • 261
2

Your format, yyyy-mm-dd, allows a lexicographic sort, so your code should work fine unless your values aren't zero padded (ex 2012-10-9 instead of 2012-10-09).

Fix this problem by relying on a comparison of dates rather than strings:

sorted(datesAndText, key=lambda x: datetime.strptime(x, '%Y-%m-%d'))

This utilizes the key parameter to sorted, which is a function which accepts one argument (an element of the list being compared during sort) and returns a value on which sorted can use to sort.

This has the ancillary benefit of allowing you to explicitly specify the string format of the date, should your data need to change.

Edit:

mgilson brought up an interesting point. str.split is probably more efficient. Let's see if he's correct:

strptime solution:

bburns@virgil:~$ python -mtimeit -s"from datetime import datetime;d={'2012-2-12':None, '2012-10-9':None, '1978-1-1':None, '1985-10-9':None}" 'sorted(d, key=lambda x: datetime.strptime(x,"%Y-%m-%d"))'
10000 loops, best of 3: 79.7 usec per loop

mgilson's original str.split solution:

bburns@virgil:~$ python -mtimeit -s"from datetime import datetime;d={'2012-2-12':None, '2012-10-9':None, '1978-1-1':None, '1985-10-9':None}" 'sorted(d,key=lambda x: [int(y) for y in x.split("-")])'
100000 loops, best of 3: 17.6 usec per loop

mgilson's zfill str.split solution:

bburns@virgil:~$ python -mtimeit -s"from datetime import datetime;d={'2012-2-12':None, '2012-10-9':None, '1978-1-1':None, '1985-10-9':None}" 'sorted(d,key=lambda x: [y.zfill(2) for y in x.split("-")])'
100000 loops, best of 3: 7.4 usec per loop

Looks like he's correct! mgilson's original answer is 4-5 times faster, and his final answer is 10-11 times faster! However, as we agreed in the comments, readability matters. Unless you're presently CPU-bound, I'd still advise going with datetime.strptime over str.split.

Community
  • 1
  • 1
Ben Burns
  • 14,978
  • 4
  • 35
  • 56
  • This would work, but I have a feeling that it would be inefficient -- though I suppose it doesn't hurt to `timeit` and see as well. – mgilson Oct 10 '12 at 14:03
  • I'd imagine you're correct if compared to your solution based on str.split, however it's a bit more readable, as it conveys the fact that we're sorting dates. – Ben Burns Oct 10 '12 at 14:05
  • I love love love performance comparison, though. http://stackoverflow.com/questions/5889611/one-liner-to-determine-if-dictionary-values-are-all-empty-lists-or-not/5889662#5889662 I suppose I'll just have to add an edit... :-) – Ben Burns Oct 10 '12 at 14:12
  • See edits. `zfill` wins the performance comparison by a full order of magnitude over my solution. I'm betting however with a larger data set that your original solution and my `strptime` solution are pretty neck-and-neck, since zfill is practically O(1) in this case. Hooray optimization! – Ben Burns Oct 10 '12 at 14:35
  • One minor comment. In the `strptime` example, you have `sorted(d.keys(),...` whereas the others just have `sorted(d,...`. It's a minor thing, but that extra function lookup could influence the timings in a small way. – mgilson Oct 10 '12 at 14:38
  • Actually, I take that back they're all going to perform just as consistently, but the overall time complexity (not just big-O) of `zfill` is much reduced over `strptime` and `int('0000')`... – Ben Burns Oct 10 '12 at 14:38
  • Thanks, but I checked the difference between passing `d` and `d`.keys() - the difference is well within the noise on this tiny dataset. It might come into play on a much larger dict, however. `d.keys()` does add to the memory complexity unnecessarily, though. I'll pull it out. – Ben Burns Oct 10 '12 at 14:41
  • Could use `d.iterkeys()` instead, but it's really unlikely to make a difference. You're not creating more *data*, only more *references* which are pretty cheap -- even for reasonably large datasets. – mgilson Oct 10 '12 at 14:44
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/17821/discussion-between-ben-burns-and-mgilson) – Ben Burns Oct 10 '12 at 14:46