So I'm trying to execute a hive query using the subprocess
module, and save the output into a file data.txt
as well as the logs (into log.txt
), but I seem to be having a bit of trouble. I've look at this gist as well as this SO question, but neither seem to give me what I need.
Here's what I'm running:
import subprocess
query = "select user, sum(revenue) as revenue from my_table where user = 'dave' group by user;"
outfile = "data.txt"
logfile = "log.txt"
log_buff = open("log.txt", "a")
data_buff = open("data.txt", "w")
# note - "hive -e [query]" would normally just print all the results
# to the console after finishing
proc = subprocess.run(["hive" , "-e" '"{}"'.format(query)],
stdin=subprocess.PIPE,
stdout=data_buff,
stderr=log_buff,
shell=True)
log_buff.close()
data_buff.close()
I've also looked into this SO question regarding subprocess.run() vs subprocess.Popen, and I believe I want .run()
because I'd like the process to block until finished.
The final output should be a file data.txt
with the tab-delimited results of the query, and log.txt
with all of the logging produced by the hive job. Any help would be wonderful.
Update:
With the above way of doing things I'm currently getting the following output:
log.txt
[ralston@tpsci-gw01-vm tmp]$ cat log.txt
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/y/share/hadoop-2.8.3.0.1802131730/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/y/libexec/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in file:/home/y/libexec/hive/conf/hive-log4j.properties
data.txt
[ralston@tpsci-gw01-vm tmp]$ cat data.txt
hive> [ralston@tpsci-gw01-vm tmp]$
And I can verify the java/hive process did run:
[ralston@tpsci-gw01-vm tmp]$ ps -u ralston
PID TTY TIME CMD
14096 pts/0 00:00:00 hive
14141 pts/0 00:00:07 java
14259 pts/0 00:00:00 ps
16275 ? 00:00:00 sshd
16276 pts/0 00:00:00 bash
But it looks like it's not finishing and not logging everything that I'd like.