52

Here are my use cases: I have a Dynamo table with a hash + range key. When I put new items in the table, I want to do a uniqueness check. Sometimes I want to guarantee that the hash is unique (ignoring the range). Other times I want to allow duplicate hashes, but guarantee that the hash and range combination is unique. How can I accomplish this?

I experimented with attribute_not_exists. It seems to handle the second case, where it checks the hash + key combination. Here's a PHP sample:

$client->putItem(array(
    'TableName' => 'test',
    'Item' => array(
        'hash' => array('S' => 'abcdefg'),
        'range' => array('S' => 'some other value'),
        'whatever' => array('N' => 233)
    ),
    'ConditionExpression' => 'attribute_not_exists(hash)'
));

Oddly, it doesn't seem to matter if I use attribute_not_exists(hash) or attribute_not_exists(range). They both seem to do exactly the same thing. Is this how it's supposed to work?

Any idea how to handle the case where I only want to check hash for uniqueness?

mkobit
  • 43,979
  • 12
  • 156
  • 150
mrog
  • 1,930
  • 3
  • 21
  • 28
  • 3
    Further experimentation shows that attribute_not_exists follows this bizarre logic: `attribute_not_exists(any_attribute_that_exists_in_the_table)` returns true if the hash + range combination doesn't exist in the table, otherwise it returns false. `attribute_not_exists(attribute_that_does_not_exist_in_the_table)` always returns true. Secondary indexes (global or local) don't affect the behavior. – mrog Sep 29 '15 at 16:28

6 Answers6

105

You can't. All items in DynamoDB are indexed by either their hash or hash+range (depending on your table).

A sort of summary of what is going on so far:

  • A single hash key can have multiple range keys.
  • Every item has both a hash and a range key
  • You are making a PutItem request and must provide both the hash and range
  • You are providing a ConditionExpression with attribute_not_exists on either the hash or range attribute name
  • The attribute_not_exists condition is merely checking if an attribute with that name exists, it doesn't care about the value

Let's walk through an example. Let's start with a hash+range key table with this data:

  1. hash=A,range=1
  2. hash=A,range=2

There are four possible cases:

  1. If you try to put an item with hash=A,range=3 and attribute_not_exists(hash), the PutItem will succeed because attribute_not_exists(hash) evaluates to true. No item exists with key hash=A,range=3 that satisfies the condition of attribute_not_exists(hash).

  2. If you try to put an item with hash=A,range=3 and attribute_not_exists(range), the PutItem will succeed because attribute_not_exists(range) evaluates to true. No item exists with key hash=A,range=3 that satisfies the condition of attribute_not_exists(range).

  3. If you try to put an item with hash=A,range=1 and attribute_not_exists(hash), the PutItem will fail because attribute_not_exists(hash) evaluates to false. An item exists with key hash=A,range=1 that does not satisfy the condition of attribute_not_exists(hash).

  4. If you try to put an item with hash=A,range=1 and attribute_not_exists(range), the PutItem will fail because attribute_not_exists(range) evaluates to false. An item exists with key hash=A,range=1 that does not satisfy the condition of attribute_not_exists(range).

This means that one of two things will happen:

  1. The hash+range pair exists in the database.
    • attribute_not_exists(hash) must be true
    • attribute_not_exists(range) must be true
  2. The hash+range pair does not exist in the database.
    • attribute_not_exists(hash) must be false
    • attribute_not_exists(range) must be false

In both cases, you get the same result regardless of whether you put it on the hash or the range key. The hash+range key identifies a single item in the entire table, and your condition is being evaluated on that item.

You are effectively performing a "put this item if an item with this hash+range key does not already exist".

mkobit
  • 43,979
  • 12
  • 156
  • 150
  • 3
    I'm still not clear why that's the case. If I insert an item with hash "123" and range "abc", `attribute_not_exists(hash)` will prevent me from inserting another item with the same hash and range. But, it won't stop me from inserting an item with hash "123" and range "xyz", even though it has the same hash as the first item. How should I handle the case where I want to avoid duplicate hashes, even if the range values are different? – mrog Sep 29 '15 at 15:42
  • 2
    @mrog That is because of the way `hash`+`range` tables are designed. A single hash key can have many range keys. Each `hash`+`range` **combination* must be unique. Take a look at [this answer](http://stackoverflow.com/a/27348364/627727) to see the difference between `hash` and `hash`+`range`. – mkobit Sep 30 '15 at 01:22
  • 1
    I understand the concept of a hash+range key, but that doesn't answer my question. Why does `attribute_not_exists(hash)` check both the hash and the range? Is it even possible to write a condition that checks just the hash? And why do `attribute_not_exists(hash)`, `attribute_not_exists(range)`, and `attribute_not_exists(whatever)` all do exactly the same thing? – mrog Sep 30 '15 at 15:54
  • 18
    Okay, it makes sense now. I thought that attribute_not_exists was looking for an an attribute with the same value as the one in the record being inserted. In reality, the value doesn't matter. It's really looking for an existing item with same primary key and an attribute (any value) with the provided name. Thanks! – mrog Oct 01 '15 at 16:17
  • 1
    @AnatoliiB how does specifying both change the result of the answer? Can you add an example or your own answer explaining how this is wrong? – mkobit Jul 11 '18 at 14:13
  • @harleyguru can you elaborate on how this is wrong? Even if you specify both hash and range as `attribute_not_exists()` it won't have an impact on the result. A `PutItem` effectively operates on a single item in DynamoDB. The condition expression evaluates against the item at the hash+range combo, not against other items. You may be able to accomplish the question's goal today using [DynamoDB transactions](https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transactions/), but I haven't tried it out yet. – mkobit Apr 29 '19 at 20:07
  • Sorry, @mkobit I am not clear with this issue now. so I've deleted my comment :) I've never touched DDB too long time, so almost forgot the issue I had earlier. – harley Apr 30 '19 at 00:23
  • 2
    from my experience, you should interpret your conditioned query as such: 1. Get the item with the provided hash and range in query 2. Check if the attribute hash (or any other attribute in conditions) exists in that entry. Now clearly if there is no entry returned in step one, in step two every attribute_not_exists returns true and every attribute_exists returns false – Wildhammer Jul 20 '20 at 17:48
  • +1, but with a slight correction: there can be either 0 or 1 range attributes in a key - no "multiple range keys". – Nemanja Trifunovic Aug 19 '20 at 13:25
  • The key here if people are still confused: an existing `hash (pk)` but unique `range (sk)` will result in no matching item being found, and so the condition will always evaluate to true even when the `pk` is NOT unique. There is no way to enforce a unique constraint on only the `pk` when `sk` is also being used as part of the primary key. The constraint applies to the primary key (`pk + sk`) as a whole. – thedarklord47 Sep 20 '22 at 22:58
  • @mkobit Did you accidentally flipped the logic under `This means that one of two things will happen`? So if `hash`+`range` pair exists in the database, both `attribute_not_exists` should return `false`. Vice versa. – Dreamer Nov 25 '22 at 06:52
22

For Googlers:

  • (a) attribute_not_exists checks whether an item with same primary key as the to-be-inserted item exists
  • (b) Additionally, it checks whether an attribute exists on that item, value does not matter
  • If you only want to prevent overwriting, then use attribute_not_exists with primary key (or partition key, or range key), since the key must exist, check (b) will always pass, only check (a) will be in effect

Reasoning:

  • The name attribute_not_exists suggests that it checks whether an attribute exists on an item
  • But there are multiple items in the table, which item does it check against?
  • The answer is it checks against the item with the same primary key as the one you are putting in
  • This happens for all condition expressions
  • But as always, it is not properly and fully documented
  • See below official document about this feature, and taste its ambiguity

Note: To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.

Link

dz902
  • 4,782
  • 38
  • 41
  • Thanks for the clarification. The docs got me VERY confused over what this is doing. The key piece of info here for me was: "which item does it check against". Essentially for PutItem, the only use of the conditional expression is deciding whether or not to overwrite a possible existing item with the same PK. Correct? – Benjamin Tamasi Nov 22 '20 at 08:44
  • @BenjaminTamasi Yep, if evaluates true only if no such item exists. – dz902 Nov 23 '20 at 07:51
6

Careful with reserved keywords.
attribute_not_exists will not work as expected if the provided attributeName matches a word from the reserved list. hash and range are both reserved and thus require the need to work around that issue by using ExpressionAttributeNames.

The following example allows for duplicate partition keys and only fails if there's already an Item in the Table with the provided partition AND sort key.

$client->putItem(array(
    'TableName' => 'test',
    'Item' => array(
        'hash' => array('S' => 'abcdefg'),
        'range' => array('S' => 'some other value'),
        'whatever' => array('N' => 233)
    ),
    'ConditionExpression' => 'attribute_not_exists(#h) AND attribute_not_exists(#r)',
    'ExpressionAttributeNames' => array('#h' => 'hash', '#r' => 'range')
));

And this one would make sure the partition key named hash is unique.

 $client->putItem(
     'TableName' => 'test',
     'Item' => array(
        'hash' => array('S' => 'abcdefg'),
        'range' => array('S' => 'some other value'),
        'whatever' => array('N' => 233)
    ),
    'ConditionExpression' => 'attribute_not_exists(#h)',
    'ExpressionAttributeNames' => array('#h' => 'hash')
));
ioioio
  • 69
  • 1
  • 3
4

This version of explanation taken from amazon aws forum says that a search will look an item that matches a provided hash key and then only checks if the attribute exists in that record. It should works the same if you have a hash and a range keys, I suppose.

If a request try to find an existing item with hash key "b825501b-60d3-4e53-b737-02645d27c2ae". If this is first time this id is being used there will be no existing item and "attribute_not_exists(email)" will evaluate to true, Put request will go through.

If this id is already used there will be an existing item. Then condition expression will look for an existing email attribute in the existing item, if there is an email attribute the Put request will fail, if there is no email attribute the Put request will go through.

Either way it's not comparing the value of "email" attribute and it's not checking if other items in the table used the same "email" value.

If email was the hash key, then request will try to find an existing item with hash key "tielur@example.me".

If there is another item with same email value an existing item will be found. Since email is the hash key it has to be present in the existing item and "attribute_not_exists(email)" will evaluate to false and Put request will fail.

If "email" value is not used before existing item will not be found and "attribute_not_exists(email)" will evaluate to true hence Put request will go through.

Jun
  • 2,942
  • 5
  • 28
  • 50
0

Following Jun711's answer, here is what I did to implement putItem only if hash doesn't exist in Kotlin. DocId is the hash key of my DynamoDB table.

val item = Item().withPrimaryKey(...).withString(...)
val putItemSpec = PutItemSpec().withItem(item)
                    .withConditionExpression("attribute_not_exists(DocId)")
table.putItem(putItemSpec)

If using DynamoDBMapper annotations, here is an example.

@DynamoDBTable(tableName = "Docs")
class Docs {
    // Partition key
    @get:DynamoDBHashKey(attributeName = "DocId")
    var docId: String? = null
}

val doc = Docs().apply {
    docId = "myId"
}

val mapper = DynamoDBMapper(AmazonDynamoDBClientBuilder.defaultClient())

// As suggested by http://rrevo.github.io/2018/03/09/dynamo-no-update/
val ifDocIdNotExists = DynamoDBSaveExpression().apply {
    expected = mapOf("DocId" to ExpectedAttributeValue().apply {
        isExists = false
    })
}

mapper.save(doc, ifDocIdNotExists)

Catch com.amazonaws.services.dynamodbv2.model.ConditionalCheckFailedException to handle the case where the hash key already exists.

Big Pumpkin
  • 3,907
  • 1
  • 27
  • 18
-5

You can use AND operation if your table has hash and range

'ConditionExpression' => 'attribute_not_exists(hash) AND attribute_not_exists(range)'

Thaina Yu
  • 1,372
  • 2
  • 16
  • 27