14

I have a task to create sets of dates based on specific condition, for example "greater than 2" will be passed and I need to create a set of all dates in this month that have a day > 2. also Ill be getting a start time and a stop time for e.g. 10am-6pm in this case I will create a set of all the dates > 2 and in every day it has a time to start at 10am and ends and 6pm, below is an example:

greater > 2 less < 9 
start time :10am
stop time :6 pm
month:july
date1: 2016-07-03 10:00, 2016-07-03 16:00
date2: 2016-07-04 10:00, 2016-07-04 16:00
date3: 2016-07-05 10:00, 2016-07-05 16:00
.
.
.
date6: 2016-07-8 10:00, 2016-07-8 16:00

I decided to store these dates into a dictionary like the following:

dictD = {'dates_between_2_9':[[2016-07-03 10:00, 2016-07-03 16:00], [2016-07-04 10:00, 2016-07-04 16:00], ....., [2016-07-08 10:00, 2016-07-08 16:00]]} 

I used the dict because I will have multiple conditions that I need to create sets of dates for them, so there will be for example another key other than dates_between_2_5.

at the other hand I get another request based on a condition too to create dates with start time only like the following:

greater > 1 less than 12
start time : 2pm
    date1: 2016-07-02 14:00
    date2: 2016-07-03 14:00
    date3: 2016-07-04 14:00
    .
    .
    .
    date10: 2016-07-11 14:00

I decided to store these dates in a list:

listL = [2016-07-02 14:00,2016-07-03 14:00,2016-07-04 14:00 ... 2016-07-11 14:00]

after that I compare each date from ListL to the list of dates for each key from DictD and if a date from ListL lies within a start,stop time then I should remove it from the list and return only the dates from ListL that don't overlap with dates from DictD, my logic is like the following:

for L from ListL:
    for every key in DictD:
        for item from DictD[key]:
            if DictD[key][0] < L < DictD[key][1] # check if item from list overlap with start,stop time from dictionary.
                ListL.remove(L) # I know I can't remove items from list while iterating so I will probably create a set and store all overlapped items and then subtract this set to set(ListL) to get the difference. 
return ListL

My question is, am I using an efficient data structures to handle my requirements? I see my logic is not that efficient so I was wondering if there is a better way for approaching this problem?

any help would be greatly appreciated. thanks in advance!

tkyass
  • 2,968
  • 8
  • 38
  • 57
  • A little off-topic advice, don't put a leading zero on integer constants. In Python 2 you might get a value you didn't intend, and in Python 3 it generates an error (except for `00`). – Mark Ransom Jul 05 '16 at 17:50

5 Answers5

5

It sounds like you're trying to optimize your algorithm. To be honest, with data this size, it's probably not necessary. However, if you are interested, the general rule of thumb is that sets are faster than lists in Python when checking for membership.

In this case, it's not clear what your sets might be. I've assumed that you have at most a minute-level of granularity, but you could go lower (for more memory) or indeed improve occupancy and performance by going for a larger granularity - e.g. hours. This code shows even relatively large sets can be at least 5x faster (and look a little simpler when comparing your data sets):

from copy import copy
from datetime import datetime, timedelta
from timeit import timeit
import time

def make_range(start, open, close, days):
    result = []
    base_start = start + open
    base_close = start + close
    while days > 0:
        result.append([base_start, base_close])
        base_start += timedelta(days=1)
        base_close += timedelta(days=1)
        days -= 1
    return result

def make_range2(start, open, close, days):
    result = set()
    base_start = start + open
    base_close = start + close
    while days > 0:
        now = base_start
        while now <= base_close:
            result.add(now)
            now += timedelta(minutes=1)
        base_start += timedelta(days=1)
        base_close += timedelta(days=1)
        days -= 1
    return result

dateRange = {
    'range1': make_range(datetime(2016, 7, 3, 0, 0),
                         timedelta(hours=10),
                         timedelta(hours=18),
                         6),
}

dateRange2 = {
    'range1': make_range2(datetime(2016, 7, 3, 0, 0),
                          timedelta(hours=10),
                          timedelta(hours=18),
                          6),
}

dateList = [
    datetime(2016, 7, 2, 14, 0),
    datetime(2016, 7, 3, 14, 0),
    datetime(2016, 7, 4, 14, 0),
    datetime(2016, 7, 5, 14, 0),
    datetime(2016, 7, 6, 14, 0),
    datetime(2016, 7, 7, 14, 0),
    datetime(2016, 7, 8, 14, 0),
    datetime(2016, 7, 9, 14, 0),
    datetime(2016, 7, 10, 14, 0),
    datetime(2016, 7, 11, 14, 0)
]

dateSet = set(dateList)

def f1():
    result = copy(dateList)
    for a in dateList:
        for b in dateRange:
            for i in dateRange[b]:
                if i[0] <= a <= i[1]:
                    result.remove(a)
    return result

def f2():
    result = copy(dateSet)
    for b in dateRange2:
        result = result.difference(dateRange2[b])
    return result

print(f1())
print(timeit("f1()", "from __main__ import f1", number=100000))

print(f2())
print(timeit("f2()", "from __main__ import f2", number=100000))

For the record, the results are as follows:

[datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)]
1.922587754837455

{datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)}
0.30558400587733225

You could also convert the dict dateRange to a list, but with just 1 or 2 members, this is unlikely to make any real difference in performance. However, it makes more logical sense, as you are not actually using the dict to look up any specific key values - you are just iterating through all the values.

Community
  • 1
  • 1
Peter Brittain
  • 13,489
  • 3
  • 41
  • 57
  • I like your answer but I'm wondering about the part where you create a range every single time we want to compare, does it effect performance? – tkyass Jul 12 '16 at 21:51
  • @tkyass I can have a quick play to see... To be clear - do you mean the line where I duplicate the dateSet/dateList (depending on which function you call)? – Peter Brittain Jul 12 '16 at 22:06
1

Frankly speaking I am not sure if I understand what is your problem, I tried something like this:

for date in dateList:
    for everyrange in dateRange:
        find=False
        for i in dateRange[everyrange]:
            #print('date={date} ,key={everyrange},i={i}'.format(date=date, everyrange=everyrange,i=i))
            if i[0] <= date <= i[1]:
                print(date)
                find=True
                break
            else:
                print(0)
        if find:
            break
sebnorth
  • 36
  • 4
  • thanks for your reply but your answer does not return the right answer because of the breaks – tkyass Jul 08 '16 at 16:02
1

I am not sure I fully understood your question, but I am assuming you want to find the dates from the 'dateList' list that fall between a specific range in the 'dateRange' dic.

I tried to structure my code based on your logic. This should work:

for date in dateList: 
    for key,value in dateRange.items():
        for i in range(0,len(value)):
            if date>=value[i][0] and date<=value[i][1]:
                print('The date:',date,'lies between the data points:',value[i][0],'and',value[i][1],'in',key)

In your data, the dateRange dic contains keys ('range') and values, which are lists of 2 datetime objects. With the code I provided, the dateRange dic can have as many keys as you like, and each key's value can contain as many lists of datetime object as you like.

simon.sim
  • 149
  • 4
  • thanks for your reply but your answer does exactly the same functionality as the code I provided – tkyass Jul 08 '16 at 16:02
1

I tried this example, based on your demand and worked well =). The algorithm is very similar to the one you've posted, the only diference is in the end of the algorithm. I choose to create a new list, to be returned in the function you are building.

Here's the code:

list_1 = ['a 1', 'a 2', 'a 3', 'a 4', 'a 5', 'b 1', 'b 2', 'b 3', 'b 4', 'b 5', 'c 1', 'c 2', 'c 3', 'c 4', 'c 5']
dict = {'example_between_2_5': [['a 3', 'a 4'], ['b 3', 'b 4'], ['c 3', 'c 4']]}
new_list = []


# Defining the number of repetitions based on how many 'lists' inside the dict you have.
for x in range(0, len(dict['example_between_2_5'])):
    dict_list_elements = dict['example_between_2_5'][x]
    # Defining the number of repetitions based on the elements inside the list of the dict.
    for y in range(0, len(dict_list_elements)):
        #Picking the element
        dict_list_element = dict_list_elements[y]
        for z in range(0, len(list_1)):
            #Comparing to all elements in list_1
            if dict_list_element == list_1[z]:
                #The element will be append if doesn't exist in the new list
                if list_1[z] not in new_list:
                    new_list.append(list_1[z])

#Printing the result just to check if it worked.
print("list_1: ", list_1) 
print("New_list: ", new_list)

Hope it helps =)

1

I'm still not absolutely certain what you're trying to achieve but please have a look at this code and tell me if this is what you want.

There is an option to input month as well.

The list named list1 is equivalent to your dictionary dictD.

The list named list2 is equivalent to your list listL. This has only those dates that don't overlap with those in list1(dictD).

Here's the code.

from datetime import datetime

#Converts 12-hour(am/pm) to 24-hour format
def get_time(time):
    digit = int(time[0:-2])
    if time[-2:] == 'am':
        return digit

    else:
        return digit+12


month_number = {
    'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6,
    'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12
}

gt1 = input('Enter first set\ngreater > ')
lt1 = input('less < ')

start1 = raw_input('start time: ')
stop1 = raw_input('stop time: ')

month1 = raw_input('month: ')


gt2 = input('\nEnter second set\ngreater > ')
lt2 = input('less < ')

start2 = raw_input('start time: ')

month2 = raw_input('month: ')

list1 = []
list2 = []

today = datetime.today()

start1 = get_time(start1)
stop1 = get_time(stop1)
start2 = get_time(start2)

key = 'dates_between_%s_%s'%(gt1, gt2)

for i in range(gt1+1, lt1):
    list1.append(
            [
            datetime(today.year, month_number[month1], i, start1, 0).strftime("%Y-%m-%d %H:%M"),
            datetime(today.year, month_number[month1], i, stop1, 0).strftime("%Y-%m-%d %H:%M")
            ]
        )

for i in range(gt2+1, lt2):
    if (month1 == month2) and (gt1 < i < lt1) and (start1 < start2 < stop1):
        pass
    else:
        list2.append(datetime(today.year, month_number[month2], i, start2, 0).strftime("%Y-%m-%d %H:%M"))

print 'List1:\n',list1
print '\nList2:\n',list2