2

At the moment I have a function to get all items from a DynamoDB table using the SCAN option. This is an expensive way to do it and I would prefer using the QUERY option. But looking at the docs there does not seem to be a simple way to retrieve all items using the QUERY option - it expects some sort of condition.

Example

var params = {
    TableName : "Movies",
    KeyConditionExpression: "#yr = :yyyy",
    ExpressionAttributeNames:{
        "#yr": "year"
    },
    ExpressionAttributeValues: {
        ":yyyy": 1985
    }
};

docClient.query(params, function(err, data) {
    if (err) {
        console.error("Unable to query. Error:", JSON.stringify(err, null, 2));
    } else {
        console.log("Query succeeded.");
        data.Items.forEach(function(item) {
            console.log(" -", item.year + ": " + item.title);
        });
    }
});

Expected

var params = {
    TableName : "Movies"  
};

docClient.query(params, function(err, data) {
    if (err) {
        console.error("Unable to query. Error:", JSON.stringify(err, null, 2));
    } else {
        console.log("Query succeeded.");
        data.Items.forEach(function(item) {
            console.log(" -", item.year + ": " + item.title);
        });
    }
});

Is it possible to retrieve all data from a table using QUERY? I thought of using BEGINS_WITH or such but all the primary keys are different/random and do not start with a specific character or phrase.

Aleksandar Zoric
  • 1,343
  • 3
  • 18
  • 45
  • 1
    Let's say you *could* get all items using query, how would this be materially better than using scan? – jarmod May 14 '20 at 15:20
  • @jarmod scan operations are generally slower and more expensive as the operation has to iterate through each item in the table to get the items I am requesting. This is the main reason as to why I do not want to use scan.. – Aleksandar Zoric May 14 '20 at 15:25
  • But you're getting all items regardless, unless you did not actually mean "all items". In fact, the query route would likely be slower than scan. – jarmod May 14 '20 at 15:35
  • All items is correct. I am going by some research points like in the link below which states that a query operation is expected to be very fast and only marginally slower than a get operation. The scan operation on the other hand can take anywhere from 50-100ms to a few hours to complete and depends on the size of the table. I could be wrong by how I am understanding this.. http://techtraits.com/cloud/nosql/2012/06/28/Amazon-DynamoDB-Understanding-Query-and-Scan-operations.html – Aleksandar Zoric May 14 '20 at 15:38
  • It just seems the way the SCAN and QUERY work to retrieve the data is different where the SCAN takes longer and the cost is higher. – Aleksandar Zoric May 14 '20 at 15:40
  • 1
    They are different. Scan will take longer to find specific items, compared to a query that knows the partition key and sort key (or range of sort keys) in advance. But you're talking about "all items" so there is no known key in advance and hence, in a world where query could theoretically give you all items, it would not be any better than scan. That said, query does *not* provide a way to get all items, so it's a moot point. – jarmod May 14 '20 at 17:29
  • Yeah - makes sense. Think I spent too long thinking about it. Thanks for that. – Aleksandar Zoric May 15 '20 at 10:23

3 Answers3

4

Technically, a query of all items in an Amazon DynamoDB table would return the same amount of data that a scan returns, so there should be no difference in cost.

The usual reduced efficiency of a scan operation is due to the fact that it has to read the whole table and then filters out values to provide the result you want, essentially adding the extra step of removing data from the result set. If you want to read the whole table without filtering, both scan and query have to retrieve all values and there is no additional filtering step.

Dennis Traub
  • 50,557
  • 7
  • 93
  • 108
  • 1
    Makes sense. Yeah, I have no filter to be applied but just to return all data from the table. It's just that after some research it seemed SCAN was to be avoided due to higher cost - even though the SCAN and QUERY can both return all data, the way they two operations return data is quite different. But this may not apply if I am returning all data anyhow.. – Aleksandar Zoric May 14 '20 at 15:53
1

The only way to do via query would be to loop over every partition key individually.

I'd suggest you look at a secondary index built around your query which will be more efficient: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

Chris Williams
  • 32,215
  • 4
  • 30
  • 68
  • 1
    Yes, I used a GSI for a different table for a different purpose. But I do see how I could use it for this - correct me if I'm wrong but if I create a GSI e.g. user-index which could contain a value of 'user' for all record. I could then just query the table where the index matches 'user'? – Aleksandar Zoric May 14 '20 at 15:30
  • @AleksandarZoric, yes you could do that, but if the amount of data that has the 'user' partition key grows over 10GB (size limit of DDB partition) you may run into trouble. See here for more info: https://stackoverflow.com/questions/40272600/is-there-a-dynamodb-max-partition-size-of-10gb-for-a-single-partition-key-value – Ben Stickley Jun 03 '22 at 02:04
0

If you want to get all data you can use scan all data, but I recommend you to get data by limit and pagination because it can kill a lot of memory resources if you have millions of data at dynamodb. this approach for getting all your data

const AWS = require('aws-sdk');
const docClient = new AWS.DynamoDB.DocumentClient({
    apiVersion: '2012-08-10',
    region: 'ap-southeast-1' // put your region
});

exports.handler = async (event, context, callback) => {
    const tableName = event.params.querystring.tablename; 
     let params = { 
         TableName: tableName
     };

     let scanResults = [];
     let items;

     do {
         items = await docClient.scan(params).promise();
         items.Items.forEach((item) => scanResults.push(item));
         params.ExclusiveStartKey = items.LastEvaluatedKey;
     } while (typeof items.LastEvaluatedKey != "undefined");

     callback(null, scanResults);
   
            
};

But with the approach below, after you get data, you need to post the LastEvaluatedKey from the frontend to params and you can use it as ExclusiveStartKey.

const AWS = require('aws-sdk');
const docClient = new AWS.DynamoDB.DocumentClient({
    apiVersion: '2012-08-10',
    region: 'ap-southeast-1' // put your region
});

exports.handler = async (event, context, callback) => {
    const tableName = event.params.querystring.tablename; 
    
    let pageSize = event.params.querystring.pagesize;
    let lastItem = event.params.querystring.lastItem;
        try {
              const params = {
                TableName: tableName,
                Limit: pageSize,
              };
              if (lastItem) {
                params.ExclusiveStartKey = { id: lastItem};
              }
              const response = await docClient.scan(params).promise();
              return {
                 items: response.Items,
                 lastItem: response.LastEvaluatedKey
              };
        
            } catch (error) {
              throw error;
            }
};