25

Is there a way to query multiple hash keys using a single query in Amazon's AWS SDK for Java?

Here's my issue; I have a DB table for project statuses. The Hash Key is the status of a project (ie: new, assigned, processing, or complete). The range key is a set of project IDs. Currently, I've got a query setup to simply find all the projects listed as a status(hash) of "assigned" and another query set to look for a status of "processing". Is there a way to do this using a single query rather than sending multiple queries for each status I need to find? Code below:

    DynamoDBMapper mapper = new DynamoDBMapper(new AmazonDynamoDBClient(credentials));
    PStatus assignedStatus = new PStatus();
    assignedStatus.setStatus("assigned");
    PStatus processStatus = new PStatus();
    processStatus.setStatus("processing");

    DynamoDBQueryExpression<PStatus> queryAssigned = new DynamoDBQueryExpression<PStatus>().withHashKeyValues(assignedStatus);
    DynamoDBQueryExpression<PStatus> queryProcessing = new DynamoDBQueryExpression<PStatus>().withHashKeyValues(processStatus);

    List<PStatus> assigned = mapper.query(PStatus.class, queryAssigned);
    List<PStatus> process = mapper.query(PStatus.class, queryProcessing);

So basically, I'd like to know if it's possible to eliminate the queryAssigned and assigned variables and handle both assignedStatus and processStatus via the same query, process, to find projects that are not new or complete.

DGolberg
  • 2,109
  • 4
  • 23
  • 31
  • http://aws.typepad.com/aws/2013/04/local-secondary-indexes-for-amazon-dynamodb.html – Guy Jul 17 '13 at 06:48
  • Sorry, that's not even close to what I was asking for and I already know about secondary indexes. – DGolberg Jul 17 '13 at 07:07
  • I think your problem suggests that your schema needs to be different. If you repeatedly need to query 2 hash keys then perhaps that should be a special hash key on its own (duplicating the data from both states). – alexandroid Jan 06 '17 at 18:38
  • You could construct a derived attribute which would label items with assigned and processing states as true and then create a sparse index on that attribute. This will allow you to retrieve the results you want efficiently and in one query. Your problem sounds very similar to the example of sparse index usage in DynamoDB best practices documentation. Check the example use-case in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general-sparse-indexes.html – JanM Aug 16 '21 at 10:14

7 Answers7

17

No, as of today there is no way to send multiple queries in the same request. If you're concerned about latency, you could make multiple requests simultaneously in different threads. This would require the same amount of network bandwidth as a "dual query" would if Dynamo offered it (assuming you're making 2, not hundreds).

Cory Kendall
  • 7,195
  • 8
  • 37
  • 64
  • It was more a thought of sending the strings once and letting the server compare the current item it's checking to both before moving to the next item to check. It would be more efficient to query if it could do this I would imagine, but I guess I'll have to wait and see if they do decide to put this in some day. *hopeful, but not expecting much* Thanks for the response at least... I'll have to do the double/triple queries for now. – DGolberg Jul 25 '13 at 06:20
  • 1
    @DGolberg Actually, to my knowledge dynamodb stores the range-key indexes for each hash-key totally separate from each other, possibly even on different partitions/hosts. Thus by implementing a multi-query, dynamo would have to add overhead like "what happens when 1 fails and 1 doesn't?" "What if one gets throttled?", "What if one is faster? (which host waits and does the join of the data)?", etc. This is my guess as to why they didn't implement it. That said, the "simplicity" wins are pretty big for the customer, so it wouldn't surprise me if they added it some day. – Cory Kendall Jul 25 '13 at 16:15
  • Interesting. I'm not overly familiar with the inner workings of DynamoDB yet, so thanks for the info on that. I'll have to be sure to update this if/when they implement something like this as it would certainly help speed things up when multiple queries are necessary. – DGolberg Jul 25 '13 at 19:48
  • @CoryKendall And what about this answer? http://stackoverflow.com/questions/32100038/dynamodb-query-or-condition-in-keyconditionexpression/32107889#32107889 What does UNION ? – TheTiger May 12 '16 at 09:50
8

Sharing my openings as of today.

GetItem, Query, Scan

Using normal DynamoDB operations you're allowed to query either only one hash key per request (using GetItem or Query operations) or all hash keys at once (using the Scan operation).

BatchGetItem

You can utilize the BatchGetItem operation, but it requires you to specify a full primary key (including the range key if you have one).

PartiQL

Since recently you can also use PartiQL - a query language supported by AWS to query DynamoDB tables. Using it, you can query several hash keys, using, for example, the IN operator:

SELECT * FROM  "table_name" WHERE "status" IN ['assigned', 'processing'];

I used PartiQL in my Python code, not Java, so that I cannot provide implementation details. But it should be quite easy to find since you know that you need yo use PartiQL. I'll leave here a reference for Python, just in case.

7

There is no way to query by multiple hash keys, but, as of April 2014, you can use QueryFilter so you can filter by non key fields in addition to hash key fields.

In a blog post on 24 April 2014, AWS announced the release of the "QueryFilter" option:

With today's release, we are extending this model with support for query filtering on non-key attributes. You can now include a QueryFilter as part of a call to the Query function. The filter is applied after the key-based retrieval and before the results are returned to you. Filtering in this manner can reduce the amount of data returned to your application while also simplifying and streamlining your code

Check this out there http://aws.amazon.com/blogs/aws/improved-queries-and-updates-for-dynamodb/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+AmazonWebServicesBlog+%28Amazon+Web+Services+Blog%29

Matt
  • 74,352
  • 26
  • 153
  • 180
Aybat
  • 81
  • 1
  • 4
  • @AybatDuyshokov Is the performance of `QueryFilter` filtering part the same as with Scan? – Tuukka Mustonen Aug 08 '14 at 12:54
  • 1
    Tuukka Mustonen 1. select by hash keys 2. filter resultset with query filter (yep, performance of this step is the same as scan) 3. return final resultset. – Aybat Aug 11 '14 at 08:57
4

Sharing my working answer for posterity. As of Oct 2020, there is a way to query multiple hash keys using a single query using aws-java-sdk-dynamodb-1.11.813.jar. I had the same requirement where I had to select items based on multiple hash keys(partition keys), and you can relate the requirement with the RDMS scenario, similar to the query select * from photo where id in ('id1','id2','id3'), here id is the primary key of the photo table.

Code Snippet

  • DynamoDB entity
package com.test.demo.dynamodb.entity;

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBHashKey;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBTable;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.NoArgsConstructor;

import java.io.Serializable;

@NoArgsConstructor
@AllArgsConstructor
@lombok.Data
@DynamoDBTable(tableName = "test_photos")
@Builder
public class Photo implements Serializable {
    @DynamoDBHashKey
    private String id;
    private String title;
    private String url;
    private String thumbnailUrl;
}

  • DynamoDB Repository Class
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper;
import com.amazonaws.services.dynamodbv2.datamodeling.KeyPair;
import com.test.demo.dynamodb.entity.Photo;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Repository;

import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Repository
public class PhotoRepository {

    @Autowired
    private DynamoDBMapper dynamoDBMapper = null;

    public List<Photo> findByIds(Collection<String> photoIds) {
        //Constructing `KeyPair` instance and setting the HashKey,
        // in this example I have only hash key,
        // if you have RangeKey(Sort) you can set that also here using KeyPair#withRangeKey

        List<KeyPair> keyPairs = photoIds.stream()
                                         .map(id -> new KeyPair().withHashKey(id))
                                         .collect(Collectors.toList());

        //Creating Map where Key as Class<?> and value as a list of created keyPairs 
        //you can also directly use batchLoad(List<Photo> itemsToGet), the only constraint 
        //is if you didn't specify the Type as key and simply using the 
        //DynamoDBMapper#batchLoad(Iterable<? extends Object> itemsToGet)
        //then the Type of Iterable should have annotated with @DynamoDBTable


        Map<Class<?>, List<KeyPair>> keyPairForTable = new HashMap<>();
        keyPairForTable.put(Photo.class, keyPairs);
        Map<String, List<Object>> listMap = dynamoDBMapper.batchLoad(keyPairForTable);

        //result map contains key as dynamoDBtable name of Photo.class
        //entity(test_photo) and values as matching results of given ids

        String tableName = dynamoDBMapper.generateCreateTableRequest(Photo.class)
                                         .getTableName();
        return listMap.get(tableName).stream()
                                     .map(e -> (Photo) e)
                                     .collect(Collectors.toList());
    }
}

  • Test Class

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.document.TableCollection;
import com.amazonaws.services.dynamodbv2.model.CreateTableRequest;
import com.amazonaws.services.dynamodbv2.model.ListTablesRequest;
import com.amazonaws.services.dynamodbv2.model.ListTablesResult;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;
import com.test.demo.dynamodb.Application;
import com.test.demo.dynamodb.entity.Photo;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.junit.jupiter.SpringExtension;

import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.UUID;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

@ActiveProfiles("test")
@ExtendWith(SpringExtension.class)
@SpringBootTest(classes = Application.class,
        webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
public class DynamoDBFindByIdsITest {

    @Autowired
    private DynamoDBMapper dynamoDBMapper = null;

    @Autowired
    private DynamoDB dynamoDB = null;

    @Autowired
    private PhotoRepository photoRepository = null;


    @Test
    void findByIdsTest() throws InterruptedException {
        //Creating dynamodb table if not already exists
        createDataTableIfNotExists("test", Photo.class);
        int size = 5;
        //creating dummy entries for test and persisting and collecting it to
        //validate with results
        List<Photo> photos = IntStream.range(0, size)
                .mapToObj(e -> UUID.randomUUID().toString())
                .map(id ->
                        Photo.builder()
                                .id(id)
                                .title("Dummy title")
                                .url("http://photos.info/" + id)
                                .thumbnailUrl("http://photos.info/thumbnails/" + id)
                                .build()
                ).peek(dynamoDBMapper::save)
                .collect(Collectors.toList());

        //calling findByIds with the Collection of HashKey ids (Partition Key Ids)
        Set<String> photoIds = photos.stream()
                .map(Photo::getId)
                .collect(Collectors.toSet());
        List<Photo> photosResultSet = photoRepository.findByIds(photoIds);

        Assertions.assertEquals(size, photosResultSet.size());

        //validating returned photoIds with the created Ids
        Set<String> resultedPhotoIds = photosResultSet.stream()
                .map(Photo::getId)
                .collect(Collectors.toSet());
        Assertions.assertTrue(photoIds.containsAll(resultedPhotoIds));
    }

    public <T> void createDataTableIfNotExists(String tablePrefix, Class<T> clazz)
            throws InterruptedException {
        ListTablesRequest listTablesRequest = new ListTablesRequest();
        listTablesRequest.setExclusiveStartTableName(tablePrefix);
        TableCollection<ListTablesResult> tables = dynamoDB.listTables();
        List<String> tablesList = new ArrayList<>();
        tables.forEach((tableResult) -> {
            tablesList.add(tableResult.getTableName());
        });
        String tableName = dynamoDBMapper.generateCreateTableRequest(clazz).getTableName();
        if (!tablesList.contains(tableName)) {
            CreateTableRequest tableRequest = dynamoDBMapper.generateCreateTableRequest(clazz);
            tableRequest.withProvisionedThroughput(new ProvisionedThroughput(5L, 5L));
            Table table = dynamoDB.createTable(tableRequest);
            table.waitForActive();
        }
    }
}

Prasanth Rajendran
  • 4,570
  • 2
  • 42
  • 59
  • 1
    This doesn't answer the OP question. It is easy to query for multiple ids when the table has no sort key. The question is how to do that when the table has a sort key without specifying the sort key in the query. – Ahmad Abdelghany Jun 25 '21 at 08:47
  • @AhmadAbdelghany , Actually if you try my answer you can find it out. Irrelevant of table having sort key(Range key) or not you can fetch multiple ids using the above answer, and I have implemented it multiple use cases many times – Prasanth Rajendran Jun 26 '21 at 03:30
  • well, I tried and it didn't work. It gives an error message saying "no RANGE key value present". – Ahmad Abdelghany Jun 28 '21 at 11:04
2

Try this in C#. I think it's similar in Java. UserId it`s the hask key.

        var table = Table.LoadTable(DynamoClient, "YourTableName");
        var batchGet = table.CreateBatchGet();
        batchGet.AddKey(new Dictionary<string, DynamoDBEntry>() { { "UserId", 123 } });
        batchGet.AddKey(new Dictionary<string, DynamoDBEntry>() { { "UserId", 456 } });
        batchGet.Execute();
        var results = batchGet.Results;
deverton
  • 31
  • 4
-2

You might have a look at BatchGetItem operation or the batchLoad() method of the DynamoDBMapper. Although a little different than a query in that it's not a query with an OR condition on the hash key, it will allow you to accomplish (generally) the same thing. Here is the language agnostic documentation and here is the Javadoc.

rpmartz
  • 3,759
  • 2
  • 26
  • 36
  • Does not work. You MUST supply a range key for the `batchLoad()` operation when a table contains one or you get a `DynamoDBMappingException`. This prevents you from simply grabbing all of the items whose hash keys are the same. For example; say you have a list of servers. This list tells you what servers are online/offline/suspended/etc. via the hash key and what the server name is via the range key. The ONLY way I know of to get all of the servers that are listed as online as well as those listed as offline (without knowing their individual names) is to do a scan or 2 queries. – DGolberg Feb 17 '14 at 21:46
-2

Amazon API doesn't support multiple hashkey filter but you can use HASH KEY + RANGE KEY filter to get the results using batchGetItem method ..

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/batch-operation-lowlevel-java.html#LowLevelJavaBatchGet

Vasanth Umapathy
  • 1,571
  • 14
  • 7
  • This is what the last poster suggested and it does not work for this situation. The whole purpose of the query is to find the IDs that match a certain status (which, incidentally, are also the range key). The `batchGetItem` request requires that you know both the hash and range key. The only reason I even have a range key is so that I can have multiple hash keys with the same value which can be queried. – DGolberg Mar 17 '14 at 05:30