1

I am trying to test if a path exist in hadoop using python script.

hdfs dfs -test -e /path/to/file

The above will return 0 if path exists, 1 if path doesn't exist. Below is my python script:

if subprocess.call("hdfs dfs -test -e /path/to/file", shell = True) == 0:
    # do something

The above code it's not working, subprocess is always 0 b/c it's checking the command status not the returned value. I found this post, but didn't seem to work. I also tried storing the return value of echoresult = subprocess.call("echo $?", shell = True), also didn't work.

Below is my full code:

#!/usr/bin/env python
import subprocess

HDFS = "hdfs dfs "

def run_commands(func, path):
    subprocess.call(HDFS + func + path, shell = True)

def path_exist(path):
    return True if run_commands("-test -e ", path) == 0 else False

path_exist("/path/to/file")
Community
  • 1
  • 1
moon
  • 1,702
  • 3
  • 19
  • 35
  • I tried your code, and `subprocess.call` returned `1` when I put in a path that did not exist in HDFS. What isn't working for you? – Matt D Aug 13 '15 at 17:28
  • @MattD I'm not sure why it's not working. the `if` statement above always equates to 0, even when the path doesn't exist. – moon Aug 13 '15 at 17:40

1 Answers1

1

Change run_commands to

def run_commands(func, path):
    return subprocess.call(HDFS + func + path, shell = True)

run_commands is not automatically returning the return code from subprocess.call. It will return None. (Tested in python 2.6.6).

Because of that, if run_commands("-test -e ", path) == 0 will not be true.

Matt D
  • 3,055
  • 1
  • 18
  • 17