I am trying to run queries on a presto cluster I have running on dataproc - via python (using presto from pyhive) on my local machine. But I can't seem to figure out the host URL. Does GCP dataproc even allow accessing the presto clusters remotely?
I tried using the URL on Presto's web UI, but that didn't work either. I also checked the docs about using Cloud Client Libraries for Python. Wasn't helpful either. https://cloud.google.com/dataproc/docs/tutorials/python-library-example
from pyhive import presto
query = '''select * FROM system.runtime.nodes'''
presto_conn = presto.Connection(host={host}, port=8060, username ={user})
presto_cursor = presto_conn.cursor()
presto_cursor.execute(query)
Error
ConnectionError: HTTPConnectionPool(host='https', port=80): Max retries exceeded with url: {url}
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb41c0c25d0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Update I was able to manually create a VM on GCP compute, configure trino and setup firewall rules and load balancer to be able to access the cluster.
Gotta check if dataproc allows similar config.