hostA has MySQL (3306 port), hive (10000 port) and hive metastore (9083 port) installed and running. hostB has presto installed and running.
Goal is to get hostB to run presto which allows queries against hivemetastore on hostA.
Getting error below. /home/ec2-user/warehouse/contact does exist (and the table is partitioned) on local filesystem (not hdfs/s3) of hostA but does not exist on hostB, why is presto trying to look for hive partitions on localhost where presto runs (hostB) instead of on hostA (where hive metastore is)? Metastore connection is established as presto is able to list the tables on the metastore,.
presto-cli --debug --catalog hive --schema default
presto:default> show tables;
Table
----------------------------
account
contact
(2 rows)
Query 20171102_122934_00012_x6ppj, FINISHED, 2 nodes
http://localhost:8080/query.html?20171102_122934_00012_x6ppj
Splits: 18 total, 18 done (100.00%)
CPU Time: 0.0s total, 615 rows/s, 18.8KB/s, 5% active
Per Node: 0.0 parallelism, 8 rows/s, 280B/s
Parallelism: 0.0
0:00 [8 rows, 250B] [17 rows/s, 560B/s]
presto:default> select * from contact;
Query 20171102_122943_00013_x6ppj failed: Partition location does not exist: file:/home/ec2-user/warehouse/contact
com.facebook.presto.spi.PrestoException: Partition location does not exist: file:/home/ec2-user/warehouse/contact
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:102)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:243)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:92)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:195)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
cat config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
# discovery.uri=http://example.net:8080
discovery.uri=http://hostB:8080
cat hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hostA:9083
2017-11-02T06:52:30.585Z INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog etc/catalog/hive.properties --
2017-11-02T06:52:31.307Z INFO main Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.allow-corrupt-writes-for-testing false false Allow Hive connector to write data even when data will likely be corrupt
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.assume-canonical-partition-keys false false
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.bucket-execution true true Enable bucket-aware execution: only use a single worker per bucket
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.bucket-writing true true Enable writing to bucketed tables
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.connect.max-retries 5 5
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.connect.timeout 500.00ms 500.00ms
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs-timeout 60.00s 60.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.domain-compaction-threshold 100 100 Maximum ranges to allow in a tuple domain without compacting it
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.domain-socket-path null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.fs.cache.max-size 1000 1000 Hadoop FileSystem cache size
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.force-local-scheduling false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.hdfs.authentication.type NONE NONE HDFS authentication type
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.hdfs.impersonation.enabled false false Should Presto user be impersonated when communicating with HDFS
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.compression-codec GZIP GZIP
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.authentication.type NONE NONE Hive Metastore authentication type
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.storage-format RCBINARY RCBINARY
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.immutable-partitions false false Can new data be inserted into existing partitions or existing unpartitioned tables
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.ipc-ping-interval 10.00s 10.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-concurrent-file-renames 20 20
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-initial-split-size 32MB 32MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-initial-splits 200 200
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-refresh-max-threads 100 100
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-outstanding-splits 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.partition-batch-size.max 100 100
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-partitions-per-scan 100000 100000 Maximum allowed partitions for a single table scan
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-partitions-per-writers 100 100 Maximum number of partitions per writer
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-split-iterator-threads 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-split-size 64MB 64MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-cache-maximum-size 10000 10000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-cache-ttl 0.00s 0.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-refresh-interval 0.00s 0.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.thrift.client.socks-proxy null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-timeout 10.00s 10.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.partition-batch-size.min 10 10
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.bloom-filters.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.default-bloom-filter-fpp 0.05 0.05 ORC Bloom filter false positive probability
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-buffer-size 8MB 8MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-merge-distance 1MB 1MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-read-block-size 16MB 16MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.optimized-writer.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.stream-buffer-size 8MB 8MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.parquet-optimized-reader.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.parquet-predicate-pushdown.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.per-transaction-metastore-cache-maximum-size 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.rcfile-optimized-writer.enabled true true
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.rcfile.writer.validate false false Validate RCFile after write by re-reading the whole file
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.recursive-directories false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.config.resources null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.respect-table-format true true Should new partitions be written using the existing table format or the default Presto format
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.skip-deletion-for-alter false false Skip deletion of old partition data when a partition is deleted and then inserted in the same transaction
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.table-statistics-enabled true true Enable use of table statistics
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.time-zone Zulu Zulu
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.orc.use-column-names false false Access ORC columns using names from the file
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.parquet.use-column-names false false Access Parquet columns using names from the file
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.dfs.verify-checksum true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.write-validation-threads 16 16 Number of threads used for verifying data after a write
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.non-managed-table-writes-enabled false false Enable writes to non-managed (external) tables
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.pin-client-to-current-region false false Should the S3 client be pinned to the current EC2 region
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.aws-access-key null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.aws-secret-key [REDACTED] [REDACTED]
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.connect-timeout 5.00s 5.00s
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.encryption-materials-provider null null Use a custom encryption materials provider for S3 data encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.endpoint null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.kms-key-id null null Use an AWS KMS key for S3 data encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-backoff-time 10.00m 10.00m
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-client-retries 5 5
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-connections 500 500
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-error-retries 10 10
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-retry-time 10.00m 10.00m
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.multipart.min-file-size 16MB 16MB Minimum file size for an S3 multipart upload
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.multipart.min-part-size 5MB 5MB Minimum part size for an S3 multipart upload
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.signer-type null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.socket-timeout 5.00s 5.00s
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.enabled false false Enable S3 server side encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.kms-key-id null null KMS Key ID to use for S3 server-side encryption with KMS-managed key
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.type S3 S3 Key management type for S3 server-side encryption (S3 or KMS)
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.ssl.enabled true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.staging-directory /tmp /tmp Temporary directory for staging files before uploading to S3
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.use-instance-credentials true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.user-agent-prefix The user agent prefix to use for S3 calls
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.metastore.uri null [thrift://hostA:9083] Hive metastore URIs (comma separated)
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.metastore thrift thrift
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-add-column false false Allow Hive connector to add column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-drop-column false false Allow Hive connector to drop column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-drop-table false false Allow Hive connector to drop table
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-rename-column false false Allow Hive connector to rename column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-rename-table false false Allow Hive connector to rename table
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.security legacy legacy
2017-11-02T06:52:31.312Z INFO main Bootstrap
2017-11-02T06:52:32.663Z INFO main com.facebook.presto.metadata.StaticCatalogStore -- Added catalog hive using connector hive-hadoop2 --