36

I have written some python code, I want to query dynamoDB data by sort key. I remember I can use follow-up code successful:

 table.query(KeyConditionExpression=Key('event_status').eq(event_status))

My table structure column

primary key:event_id
sort key: event_status
Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48
scott
  • 441
  • 2
  • 5
  • 10
  • 1
    I believe this can be done by creating a `Global Secondary Index` in the form of an `Inverted Index`. – Jarrett Nov 05 '21 at 06:43

5 Answers5

36

You have to create a global secondary index (GSI) for the sort key in order to query on it alone.

Mathix420
  • 872
  • 11
  • 21
mrbillyocean
  • 1,341
  • 1
  • 14
  • 24
10

If you don't want to scan (and maybe you shouldn't), you will need to create a GSI (Global Secondary Index) for that, and set event_status as the GSIPK.

so your table config will be:

 table = dynamodb.create_table(
        TableName="your_table",
        KeySchema=[
            {"AttributeName": "event_id", "KeyType": "HASH"},  # Partition key
            {"AttributeName": "event_status", "KeyType": "RANGE"},  # Sort key
        ],
        AttributeDefinitions=[
            {"AttributeName": "event_id, "AttributeType": "S"},
            {"AttributeName": "event_status", "AttributeType": "S"},
            {"AttributeName": "gsi_event_status", "AttributeType": "S"},
            {"AttributeName": "gsi_event_id", "AttributeType": "S"},
        ],
        GlobalSecondaryIndexes=[
            {
                "IndexName": "gsiIndex",
                "KeySchema": [
                    {"AttributeName": "gsi_event_status", "KeyType": "HASH"},
                    {"AttributeName": "gsi_event_id", "KeyType": "RANGE"},
                ],
                "Projection": {"ProjectionType": "ALL"},
            },
        ],
        BillingMode="PAY_PER_REQUEST",
    )

Be mindful that GSIs can be expensive and you might wanna change the ProjectionType if you don't need all attributes.

Now you can query by pk:

table.query(KeyConditionExpression=Key('event_id').eq(event_id))

or by the GSI PK which is set to your sk:

lookup.query(
        IndexName="gsiIndex",
        KeyConditionExpression=Key("gsi_event_status").eq(event_status),
    )
Ben Dubuisson
  • 727
  • 13
  • 38
  • The event_id and event_status are listed twice in the AttributeDefinitions. Is there a reason for that or doesn't it work the same with the duplicate entries removed? – Mat-KH Dec 14 '22 at 16:10
  • I think it was meant to define attributes for main table and GSI, will update with different names so it's more obvious – Ben Dubuisson Dec 16 '22 at 21:15
9

The scan API should be used if you would like to get data from DynamoDB without using Hash Key attribute value.

Example:-

fe = Attr('event_status').eq("new");

response = table.scan(
        FilterExpression=fe        
    )

for i in response['Items']:

print(json.dumps(i, cls=DecimalEncoder))

while 'LastEvaluatedKey' in response:
    response = table.scan(        
        FilterExpression=fe,        
        ExclusiveStartKey=response['LastEvaluatedKey']
        )

    for i in response['Items']:
        print(json.dumps(i, cls=DecimalEncoder))
notionquest
  • 37,595
  • 6
  • 111
  • 105
  • 18
    Note that scanning is not the same as querying. You will be iterating over all the values so it is much less efficient than a query. If you really need to efficiently query on ``event_status`` you should consider creating a GSI for that field. – garnaat Feb 21 '17 at 16:35
1

According to the main concept of the sort key, it is part of main cluster in the partition key to define some filter expression with partition key in query. so there is no ability to search on the sort key alone and without partition key. unless to define a global secondary index on the sort key.

Hamid Jolany
  • 800
  • 7
  • 11
0

By using FilterExpression we can scan the table using Sort key

NOTE: here LastUpdated is sortkey

Example:

from_date = "fromdate"
to_date = "todate"

dynamodb = boto3.resource('dynamodb', region_name='ap-south-1')
table = dynamodb.Table("your-tablename")
response =table.scan(
    FilterExpression=Attr('LastUpdated').between(from_date,to_date))
    )
result = response['Items']
  • 2
    Be careful, this only filters up to 1MB of data that is returned from the original query. From the docs: "A filter expression is applied after a Query finishes, but before the results are returned" – Ben Stickley Apr 09 '21 at 15:20