10

I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select?

There is less documentation on these terms of S3 select

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
bharath reddy
  • 101
  • 2
  • 4

1 Answers1

5

To keep things simple lets forget for some time that S3 reads in a columnar way. Suppose you have the following data:

| City       | Last Updated Date   |
|------------|---------------------|
| London     | 1st Jan             |
| London     | 2nd Jan             |
| New Delhi  | 2nd Jan             |

A query for fetching the latest update date

  • forces S3 to scan all 3 records
  • but the returned records are only 2 (when the last updated date is 2nd Jan)

A query of select city where last updated date is 1st Jan,

  • will scan all 3 rows
  • but return only 1 string - "New Delhi".

Hence based on your query, it might scan more data (3 rows) but return less data (2 rows).

I hope you understand the difference between Data Scanned and Data Returned now.

surfmuggle
  • 5,527
  • 7
  • 48
  • 77
Pulkit Agarwal
  • 159
  • 2
  • 10
  • For what kind of situations would you query your S3 data? Only with Athana? Or are there other situations? – galeop Jun 21 '22 at 18:13
  • 3
    @galeop this is for the AWS service called **S3 Select** https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html This allows you to filter data in buckets using SQL – Ari Sep 29 '22 at 04:55