0

I am trying to run a hdfs dfs -ls in a folder with pyspark but I cant:

import subprocess

def run_cmd(args_list):
   
    print('Running system command: {0}'.format(' '.join(args_list)))
    proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    s_output, s_err = proc.communicate()
    s_return =  proc.returncode
    return s_return, s_output, s_err 
 
hdfs_file_path = '/user/th/folder_4/files_4/'
cmd = ['hdfs', 'dfs', '-ls', hdfs_file_path]
ret, out, err = run_cmd(cmd)
print(ret, out,err)

It returns me the below. As you will see the output is '' but the folder has files:

(1, '', 'WARNING: log4j.properties is not found.') 
DrGenius
  • 817
  • 1
  • 9
  • 26
  • 1
    what is the relation with pyspark ? this is just python code. executing `pyspark` on a shell is just a shortcut for `python` + creation of spark context. In the end, you'll end up with python shell. – Steven Jul 19 '21 at 11:41
  • 1
    what is `cmd` ? What is the output of `hdfs dfs -ls` in shell ? – Steven Jul 19 '21 at 11:43
  • @Steven I have edited. So I have to import spark and create a SparkSession? – DrGenius Jul 19 '21 at 11:46
  • 1
    No, no need of spark. your current issue is not spark related. – Steven Jul 19 '21 at 11:50
  • If you have already a Spark session that is configured to interact with HDFS, maybe [this answer](https://stackoverflow.com/a/40258750/2129801) can help – werner Jul 19 '21 at 16:22

0 Answers0