I am trying out S3 Select from Presto using hive connector and Minio Object store. I am able to create an external table and run all the SQL queries. But, S3 Select does not seem to be working, even with the hive.s3select-pushdown.enabled=true
set in the properties file in the catalog folder. I ran a packet trace on the Minio server, I only see GET/LIST calls being made, do not see any POST /{Key+}?select&select-type=2 HTTP/1.1
being made.
Below is the hive properties file.
hive.metastore.uri=thrift://hadoop-master:9083
hive.s3.path-style-access=true
hive.s3.endpoint=http://X.X.X.X:9000
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
hive.non-managed-table-writes-enabled=true
hive.storage-format=ORC
hive.s3select-pushdown.enabled=true
I see that the same is set from the SESSION parameters in presto.
minio.s3_select_pushdown_enabled | true | true
minio.projection_pushdown_enabled | true | true
This is how I am creating the external table from presto cli.
presto:default> CREATE TABLE nyc_9 ( vendorid VARCHAR, tpep_pickup_datetime VARCHAR, tpep_dropoff_datetime VARCHAR, passenger_count VARCHAR, trip_distance VARCHAR, ratecodeid VARCHAR, store_and_fwd_flag VARCHAR, pulocationid VARCHAR, dolocationid VARCHAR, payment_type VARCHAR, fare_amount VARCHAR, extra VARCHAR, mta_tax VARCHAR, tip_amount VARCHAR, tolls_amount VARCHAR, improvement_surcharge VARCHAR, total_amount VARCHAR) WITH (FORMAT = 'CSV', skip_header_line_count = 1, EXTERNAL_LOCATION = 's3a://test10gb5/');
Query being run
presto:default> SELECT * FROM nyc_9 WHERE trip_distance > '20' AND fare_amount > '10' AND tip_amount > '2' AND passenger_count = '2' LIMIT 10;
vendorid | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | ratecodeid | store_and_fwd_flag | pulocationid | dolocationid | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_sur
----------+------------------------+------------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+----------------
2 | 04/26/2018 08:51:16 AM | 04/26/2018 09:42:03 AM | 2 | 5.06 | 1 | N | 236 | 170 | 1 | 31 | 0 | 0.5 | 6.36 | 0 | 0.3
2 | 04/26/2018 08:14:17 AM | 04/26/2018 08:35:08 AM | 2 | 6.88 | 1 | N | 263 | 45 | 1 | 22 | 0 | 0.5 | 6.84 | 0 | 0.3
1 | 04/26/2018 08:19:47 AM | 04/26/2018 09:17:45 AM | 2 | 9.7 | 1 | N | 138 | 144 | 1 | 39 | 0 | 0.5 | 8 | 0 | 0.3
2 | 04/26/2018 08:38:15 AM | 04/26/2018 09:09:58 AM | 2 | 4.73 | 1 | N | 142 | 144 | 1 | 22 | 0 | 0.5 | 4.56 | 0 | 0.3
2 | 04/26/2018 08:38:26 AM | 04/26/2018 09:22:12 AM | 2 | 5.95 | 1 | N | 239 | 13 | 1 | 29 | 0 | 0.5 | 2.98 | 0 | 0.3
2 | 04/26/2018 08:47:03 AM | 04/26/2018 09:17:02 AM | 2 | 3.27 | 1 | N | 158 | 162 | 1 | 19 | 0 | 0.5 | 3.96 | 0 | 0.3
2 | 04/26/2018 08:21:19 AM | 04/26/2018 08:46:55 AM | 2 | 3.89 | 1 | N | 262 | 107 | 1 | 18.5 | 0 | 0.5 | 3.86 | 0 | 0.3
2 | 04/26/2018 08:35:32 AM | 04/26/2018 09:01:54 AM | 2 | 4.09 | 1 | N | 236 | 137 | 1 | 17.5 | 0 | 0.5 | 3.66 | 0 | 0.3
1 | 04/26/2018 08:43:45 AM | 04/26/2018 09:03:41 AM | 2 | 3 | 1 | N | 163 | 145 | 1 | 15 | 0 | 0.5 | 6 | 0 | 0.3
1 | 04/26/2018 08:01:47 AM | 04/26/2018 08:13:08 AM | 2 | 3.1 | 1 | N | 264 | 137 | 1 | 12 | 0 | 0.5 | 2.55 | 0 | 0.3
(10 rows)
Is there anything else that needs to be done for S3 Select to work?