1

I am trying to make connection to a remote Hive cluster using python. I tried pyhive, pyhs2 but had no success. With the below code, I am able to connect to hive but how can I print and save the result in a panda data frame?

I have tried the below lines without any luck: out = stdout.read() print stdout.read()

import os
import paramiko


ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
ssh.connect('00.00.00.00.', username='******', password='*******')
sshin, sshout, ssherr= ssh.exec_command('hive -e "select * from t1"')

I want to print and save the result in a pandas data frame.

Mona
  • 273
  • 1
  • 2
  • 13
  • what problem are you facing with Pyhive? You can easily make a connection to hive using Pyhive also. see answer below https://stackoverflow.com/questions/57157942/access-tables-from-impala-through-python/57168447#57168447 – vikrant rana Aug 05 '19 at 08:59

1 Answers1

0

If you mean load the results in a pandas DataFrame then, first save it locally

hive -e "select * from t1" > /home/yourfile.tsv

Check this answer.

Then load it into a DataFrame like so:

import pandas as pd
df = pd.read_csv("/home/yourfile.tsv",delimiter = "\t")
Omkar Sabade
  • 765
  • 3
  • 12