1

I'm using subprocess to run hive commands in python, but am getting empty results. If i run the same commands from hive CLI, am getting results.

 query = "set hive.cli.print.header=true;use mydb;describe table1;"  
 process = subprocess.Popen( ["ssh", "hadoop" , "hive", "-e", "%r" % query], stdout = subprocess.PIPE, stderr = subprocess.PIPE )  
 data = [line.split('\t') for line in process.stdout]  
 cols = list(itertools.chain.from_iterable(data[:1]))  
 df = pd.DataFrame(data[1:], columns = cols)  
 print "==>"+df+"<----"  

It's returning empty dataframe.

Please help me with this

Denver
  • 245
  • 1
  • 5
  • 15
  • 1
    Did you try to use the `wait` function, in order to wait for the completion of the subprocess? Remember that subprocesses are run in parallel. – albertoql Mar 23 '16 at 06:43
  • can u please elaborate. Thank You. – Denver Mar 23 '16 at 07:05
  • 1
    After the creation of the process, `process.wait()` in order to wait for the completion of the process and then read the data from the standard output. It is possible otherwise that while reading the standard output the process has not finished yet. – albertoql Mar 23 '16 at 14:33
  • 1
    unrelated: 1- you should use `query` instead of `"%r" % query`. The latter (among other things) add quotes (there is no shell to remove them -- the command line is already splitted on separate items) 2- it seems you could use `cols = data and data[0]` instead of `cols = list(itertools.chain.from_iterable(data[:1]))` 3- don't use `stderr=PIPE` unless you read from the pipe (concurrently with reading `stdout`) -- otherwise a deadlock may happen. 4- call `rc = process.wait()` somewhere at the end, to avoid zombies and to check the exit status. – jfs Mar 24 '16 at 01:02

1 Answers1

2
myfile=open("query_result.tsv", 'w')
p=subprocess.Popen("your query",
        shell=True,
        stdout=myfile,stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
    print stderr
    sys.exit(1)

myfile is a tsv file,you can use pandas.read_csv(sep='\t') and set sep='\t' ,you may need to look up pandas api to find more usage about read_csv().

you should look up subprocess api in 17.1.2 about Popen Object.it gives you a warning about stdout=PIPE. https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

wanyang.02
  • 36
  • 3
  • the warning doesn't apply if you use `.communicate()` (it can handle both `stdout` and `stderr` properly). – jfs May 20 '16 at 09:47