4

I want to pass a JSON string as command line argument to my reducer.py file but I'm unable to do so.

Command I execute is:

hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py {"abc":"123"}' -input /user/abc.txt -output /user/output/

When I print argv array in reducer.py, it shows output as:

['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py', '{', 'abc', ':', '123', '}']

First argument is the path of reducer.py but my second argument gets split by double quotes.

I want to achieve second argument as a complete JSON string. For example like: ['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py','{"abc":"123"}']

So that I can load that argument as Json in reducer.py

Any help is appreciated. Thanks !

EDIT: Tried escaping JSON using command:

hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py "{\"abc\":\"123\"}"' -input /user/abc.txt -output /user/output/

Gives output as:

['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0058/container_1423459215008_0058_01_000004/./redu.py', '{\\', 'abc\\', ':\\', '123\\', '}']

shahsank3t
  • 252
  • 1
  • 13

1 Answers1

1

You need to put your json inside double quotes with proper escaping: "{\"abc\":\"123\"}" but chances are that your input will be processed Hadoop before being passed to your script.

If this doesn't work you can try passing your arguments via environment with -cmdenv name=value. See How do I pass a parameter to a python Hadoop streaming job? for more details.

Community
  • 1
  • 1
Saulius Žemaitaitis
  • 2,956
  • 3
  • 29
  • 36