3

I am trying to search my database to see if a date range I am about to add overlaps with a date range that already exists in the database.

Using this question: Determine Whether Two Date Ranges Overlap

I came up with firstDay <= :end and lastDay >= :start for my FilterExpression.

def create(self, start=None, days=30):
    # Create the start/end times
    if start is None:
        start = datetime.utcnow()
    elif isinstance(start, datetime) is False:
        raise ValueError('Start time must either be "None" or a "datetime"')
    end = start + timedelta(days=days)

    # Format the start and end string "YYYYMMDD"
    start = str(start.year) + str('%02d' % start.month) + str('%02d' % start.day)
    end = str(end.year) + str('%02d' % end.month) + str('%02d' % end.day)

    # Search the database for overlap
    days = self.connection.select(
        filter='firstDay <= :end and lastDay >= :start',
        attributes={
            ':start': {'N': start},
            ':end': {'N': end}
        }
    )

    # if we get one or more days then there is overlap
    if len(days) > 0:
        raise ValueError('There looks to be a time overlap')

    # Add the item to the database
    self.connection.insert({
        "firstDay": {"N": start},
        "lastDay": {"N": end}
    })

I am then calling the function like this:

seasons = dynamodb.Seasons()
seasons.create(start=datetime.utcnow() + timedelta(days=50))

As requested, the method looks like this:

def select(self, conditions='', filter='', attributes={}, names={}, limit=1, select='ALL_ATTRIBUTES'):
    """
    Select one or more items from dynamodb
    """
    # Create the condition, it should contain the datatype hash
    conditions = self.hashKey + ' = :hash and ' + conditions if len(conditions) > 0 else self.hashKey + ' = :hash'
    attributes[':hash'] = {"S": self.hashValue}
    limit = max(1, limit)

    args = {
        'TableName': self.table,
        'Select': select,
        'ScanIndexForward': True,
        'Limit': limit,
        'KeyConditionExpression': conditions,
        'ExpressionAttributeValues': attributes
    }
    if len(names) > 0:
        args['ExpressionAttributeNames'] = names
    if len(filter) > 0:
        args['FilterExpression'] = filter

    return self.connection.query(**args)['Items']

When I run the above, it keeps inserting the above start and end date into the database because it isn't finding any overlap. Why is this happening?

The table structure looks like this (JavaScript):

{
  TableName: 'test-table',
  AttributeDefinitions: [{
    AttributeName: 'dataType',
    AttributeType: 'S'
  }, {
    AttributeName: 'created',
    AttributeType: 'S'
  }],
  KeySchema: [{
    AttributeName: 'dataType',
    KeyType: 'HASH'
  }, {
    AttributeName: 'created',
    KeyType: 'RANGE'
  }],
  ProvisionedThroughput: {
    ReadCapacityUnits: 5,
    WriteCapacityUnits: 5
  },
}
Get Off My Lawn
  • 34,175
  • 38
  • 176
  • 338
  • Can you show us the actual Dynamo query? i.e. connection.select. Also have you tried printing days out in debug? What values are being produced? – F_SO_K Jul 25 '18 at 07:35
  • @Stu I have updated the question. When I do `print(days)` the console displays `[]`. A strange thing seems to be happening as well. If I clear the database and remake the table, this works, but if I remove `+ timedelta(days=50)` and run the script multiple times it doesn't catch the new `start` value until I rebuild the database table, then adding back `+ timedelta(days=50)` it stops catching that. It seems to only be catching one date range until I rebuild the database table than it catches the new date range. – Get Off My Lawn Jul 25 '18 at 13:42
  • What is your hashkey? Do you want this to run over just one partition key, or over all of the data in the table? – F_SO_K Jul 25 '18 at 13:49
  • It looks like this when I create an instance: `dynamodb.Dynamodb('table-name', 'hash-key', 'hash-value')` and the init looks like this: `def __init__(self, table, hashKey, hashValue, endpoint='http://localhost:8000')` – Get Off My Lawn Jul 25 '18 at 13:51
  • @Stu I update the question with the `createTable` structure – Get Off My Lawn Jul 25 '18 at 13:52
  • I want it to run on one `HASH`, called `seasons` – Get Off My Lawn Jul 25 '18 at 13:54
  • OK, a query is fine then. Looks like you are setting Limit=1. This does NOT limit your result set to the first result. Limit means it will only evaluate the first item in the partition set. You probably need to remove that limit. – F_SO_K Jul 25 '18 at 13:55
  • https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-Limit – F_SO_K Jul 25 '18 at 13:56
  • Say you had 10 items in the partition key range. Lets also say the fifth item had an overlap. Setting Limit=1 would mean you would only evaluate the first item, you would never get to the fifth item. – F_SO_K Jul 25 '18 at 13:57
  • Okay! that looks like that is it! It is now raising my error saying there is overlap! – Get Off My Lawn Jul 25 '18 at 13:58
  • Is there a way to return a maximum number of results? – Get Off My Lawn Jul 25 '18 at 13:59
  • I use the Java SDK. I can have the Java SDK return me pages of results with one result on each page (i.e. return one result). Note if you did this it wouldn't actually make your query any faster. I would just check to see if your result set is larger than zero. – F_SO_K Jul 25 '18 at 14:01
  • The doc shows this for select, `Select='ALL_ATTRIBUTES'|'ALL_PROJECTED_ATTRIBUTES'|'SPECIFIC_ATTRIBUTES'|'COUNT'` Would using `COUNT` be a good thing to use? – Get Off My Lawn Jul 25 '18 at 14:07
  • It would tell you how many results there were, yes. DynamoDB is not like a traditional database though. Your speed is limited by your throughput. Is doesn't matter if you return the whole items or the count, the throughput is the same. So just returning Count won't make your Query faster. – F_SO_K Jul 25 '18 at 14:10
  • (unless you items are larger than 400KB) – F_SO_K Jul 25 '18 at 14:10
  • Okay that is what I though, I am just trying to send less data through, so if for some reason 100's of records matched this, then I don't want to return them especially since I don't care what the data is for this case, I only care if there is at least one match. – Get Off My Lawn Jul 25 '18 at 14:14

1 Answers1

3

Looks like you are setting LIMIT=1. You are probably using this to say 'just return the first match found'. In fact setting Limit to 1 means you will only evaluate the first item found in the Query (i.e. in the partition value range). You probably need to remove the limit, so that each item in the partition range is evaluated for an overlap.

F_SO_K
  • 13,640
  • 5
  • 54
  • 83