Python - Hive commands using Subprocess - empty results

Question

I'm using subprocess to run hive commands in python, but am getting empty results. If i run the same commands from hive CLI, am getting results.

 query = "set hive.cli.print.header=true;use mydb;describe table1;"  
 process = subprocess.Popen( ["ssh", "hadoop" , "hive", "-e", "%r" % query], stdout = subprocess.PIPE, stderr = subprocess.PIPE )  
 data = [line.split('\t') for line in process.stdout]  
 cols = list(itertools.chain.from_iterable(data[:1]))  
 df = pd.DataFrame(data[1:], columns = cols)  
 print "==>"+df+"<----"

It's returning empty dataframe.

Please help me with this

Did you try to use the `wait` function, in order to wait for the completion of the subprocess? Remember that subprocesses are run in parallel. — albertoql, Mar 23 '16 at 06:43
After the creation of the process, `process.wait()` in order to wait for the completion of the process and then read the data from the standard output. It is possible otherwise that while reading the standard output the process has not finished yet. — albertoql, Mar 23 '16 at 14:33
unrelated: 1- you should use `query` instead of `"%r" % query`. The latter (among other things) add quotes (there is no shell to remove them -- the command line is already splitted on separate items) 2- it seems you could use `cols = data and data[0]` instead of `cols = list(itertools.chain.from_iterable(data[:1]))` 3- don't use `stderr=PIPE` unless you read from the pipe (concurrently with reading `stdout`) -- otherwise a deadlock may happen. 4- call `rc = process.wait()` somewhere at the end, to avoid zombies and to check the exit status. — jfs, Mar 24 '16 at 01:02

score 2 · Accepted Answer · answered May 19 '16 at 12:49

myfile=open("query_result.tsv", 'w')
p=subprocess.Popen("your query",
        shell=True,
        stdout=myfile,stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
    print stderr
    sys.exit(1)

myfile is a tsv file,you can use pandas.read_csv(sep='\t') and set sep='\t' ,you may need to look up pandas api to find more usage about read_csv().

you should look up subprocess api in 17.1.2 about Popen Object.it gives you a warning about stdout=PIPE. https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

the warning doesn't apply if you use `.communicate()` (it can handle both `stdout` and `stderr` properly). — jfs, May 20 '16 at 09:47

Python - Hive commands using Subprocess - empty results

1 Answers1

Linked