0

I have multiple hadoop commands to be run and these are going to be invoked from a python script. Currently, I tried the following way.

import os
import xml.etree.ElementTree as etree
import subprocess

filename = "sample.xml"
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
tree = etree.parse(__fullpath__)
root = tree.getroot()
hivetable = root.find("hivetable").text
dburl = root.find("dburl").text
username = root.find("username").text
password = root.find("password").text
tablename = root.find("tablename").text
mappers = root.find("mappers").text
targetdir = root.find("targetdir").text
print hivetable
print dburl
print username
print password
print tablename
print mappers
print targetdir

p = subprocess.call(['hadoop','fs','-rmr',targetdir],stdout = subprocess.PIPE, stderr = subprocess.PIPE)

But, the code is not working.It is neither throwing an error not deleting the directory.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
wandermonk
  • 6,856
  • 6
  • 43
  • 93

1 Answers1

3

I suggest you slightly change your approach, or this is how I'm doing it. I make use of python library import commands which then depends how you will use it (https://docs.python.org/2/library/commands.html). Here is a lil demo:

import commands as com
print com.getoutput('hadoop fs -ls /')

This gives you output like (depending on what you have in the HDFS dir )

/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh: line 25: /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home: Is a directory
Found 2 items
drwxr-xr-x   - someone supergroup          0 2017-03-29 13:48 /hdfs_dir_1
drwxr-xr-x   - someone supergroup          0 2017-03-24 13:42 /hdfs_dir_2

Note: the lib commands doesn't work with python 3 (to my knowledge), I'm using python 2.7. Note: Be aware of the limitation of commands

If you will use subprocess which is the equivalent to commands for python 3 then you might consider to find a proper way to deal with your 'pipelines'. I find this discussion useful in that sense: (subprocess popen to run commands (HDFS/hadoop))

I hope this suggestion helps you!

Best

Community
  • 1
  • 1