1

I am using python v3.4 on my server and I frequently need to copy/move multiple files from my local directory to hdfs directory. All my files are in sub-directories, which in turn are in MyDir. Here is the command which I use-

$ hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/

This command runs fine on server, but when I use the same command inside python using subprocess

>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'])

It gives the following error-

copyFromLocal: `MyDir/*': No such file or directory
1

P.S.- I also tried ['hadoop', 'fs', '-put'....] instead of ['hdfs', 'dfs', '-copyFromLocal'....], it is also not working.

Can anyone help me on this? Any help would be appreciated.

EDIT- I need to move files along with sub-directories.

Ankit Seth
  • 31
  • 2
  • 9

3 Answers3

1

add shell=True:

>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'], shell=True)

Read this post: Actual meaning of 'shell=True' in subprocess

RaminNietzsche
  • 2,683
  • 1
  • 20
  • 34
  • I tried it, it returned `hadoop command usage` alongwith `zero exit status`, but when I checked the files in hdfs path, they are not there. – Ankit Seth Aug 16 '17 at 08:18
1

I would write a function with subprocess that gives you output and error:

import subprocess
def run_cmd(args_list):
    """
    run linux commands
    """
    # import subprocess
    print('Running system command: {0}'.format(' '.join(args_list)))
    proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    s_output, s_err = proc.communicate()
    s_return =  proc.returncode
    return s_return, s_output, s_err

Then:

 import os
 for file in os.listdir('your-directory'):
     run_cmd(['hadoop', 'fs', '-put', 'your-directory/{0}'.format(file), 'target-directory'])

That should loop through all of the files in your directory and put them in your desired HDFS directory

Trevor McCormick
  • 366
  • 1
  • 3
  • 12
1

Append everything in the command into a single string and give parameter shell = True

subprocess.call('hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/', shell = True)
Harsha Reddy
  • 391
  • 5
  • 8