2

I would appreciate help from anyone familiar with how DynamoDB work. I need to perform scan on a large DynamoDB table. I know that DynamoDBClient scan operation is limited to 1 MB size of returned data. Does the same restriction apply to Table.scan operation? The thing is that Table.scan operation returns output of type "ItemCollection<ScanOutcome>", while DynamoDBClient scan returns ScanResult output and it is not clear to me whether these operations work in a similar way or not.

I have checked this example: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html, but it doesn't contain any hints about using last returned key.

My questions are: Do I still need to make scan calls in a cycle until lastreturnedkey is null if I use Table.scan? If yes, how do I get last key? If not, how can I enforce pagination? Any links to code examples would be appreciated. I have spent some time googling for examples, but most of them are either using DynamoDBClient or DynamoDBMapper, while I need to use Table and Index objects instead.

Thanks!

Tofig Hasanov
  • 3,303
  • 10
  • 51
  • 81
  • You said yo have a very large table, but you are looking for something in particular (or a set), so you can start filtering your result (which is obvious I guess). If the same is not big enough: yes, you have to keep searching in the next batch(es). – x80486 Sep 06 '16 at 10:55
  • I am not sure I understood your comment. I do have a filterexpression that filters out my scan results, but that doesn't guarantee that my results will never exceed 1Mb – Tofig Hasanov Sep 06 '16 at 10:59
  • So, you need to scan the next batch; you can do it in parallel by "playing" with `Segments` and/or `TotalSegments`; in that case the value of `LastEvaluatedKey` returned from the request must be used as the `ExclusiveStartKey` with the same segment ID in a subsequent scan operation. It's pretty much like SQL, but faster! – x80486 Sep 06 '16 at 11:06
  • There is no "LastEvaluatedKey" parameter in Table.scan output type – Tofig Hasanov Sep 06 '16 at 11:08
  • Indeed, there is, the reason you don't see it is because you are not using "segments". Refer [here](http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html) to their documentation. You might want to pay attention to the second paragraph: _If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a `LastEvaluatedKey` value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit_. – x80486 Sep 06 '16 at 11:10
  • I think there is a confusion here. I am already using segments. Please check the link in the question. I am following that approach now (using segments). But the scan result from Table.scan and DynamoDBClient.scan are different, and the first one has not LastEvaluatedKey. That is the reason I have asked this question in the first place – Tofig Hasanov Sep 06 '16 at 11:13
  • 1
    why would not pages() work for you http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/ItemCollection.html#pages-- – kuhajeyan Sep 06 '16 at 12:52
  • Thanks for pointing this out. I haven't noticed it before. I will try using it. – Tofig Hasanov Sep 06 '16 at 13:15

1 Answers1

1

If you iterate over the output of Table.scan(), the SDK will do pagination for you.

Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48