1

I'm trying without sucess to pass a Json string to a Python Script using PowerShell Script (.ps1) to automate this task.

spark-submit `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param

When $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test\"\"}' works fine, the python receives a valid JSON string and parse correctly.

When I use the character & like $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&autoReconnect=true&useSSL=false\"\"}' the string is printed like { "job_start": \jdbc:mysql://127.0.0.1:3307/test? and the rest of the string are reconized as other commands.

'serverTimezone' is not recognized as an internal or external command
'autoReconnect' is not recognized as an internal or external command
'useSSL' is not recognized as an internal or external command

The \"\" is need to maintain the double quots in the Python script, not sure why need two escaped double quotes.

UPDATE:

Now I'm having problems with the ! character, I can't escape this character even with ^ or \.

# Only "" doesn't work
$param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\", \"\"password\"\": \"\"testpassword^!123\"\"}'

spark-submit.cmd `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param

# OUTPUT: misses the ! character
{"job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC", "password": "testpassword123"}

Thanks you all.

  • 1
    I wonder if the spark-submit has another context the a simple python script? – Bruno Bernardes May 06 '20 at 12:34
  • It would be good to understand whether there truly is a problem with how `spark-submit` relays arguments to Python, or whether the problem is unique to your scenario / environment. Also, in your update you refer to _output_: who produces that output? – mklement0 May 09 '20 at 14:03
  • @mklement0 yes I'm trying to figure out what happen with ````spark-submit```, because passing directly to Python is working with yours explanations. About the output is a print function in the begining of my spark script – Bruno Bernardes May 10 '20 at 20:09

2 Answers2

2

tl;dr

Note: The following does not solve the OP's specific problem (the cause of which is still unknown), but hopefully contains information of general interest.

# Use "" to escape " and - in case of delayed expansion - ^! to escape !
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
  • There are high-profile utilities (CLIs) such as az (Azure) that are Python-based, but on Windows use an auxiliary batch file as the executable that simply relays arguments to a Python script.
    • Use Get-Command az, for instance, to discover an executable's full file name; batch files, which are processed by cmd.exe, the legacy command processor, have a filename extension of either .cmd or .bat
  • To prevent calls to such a batch file from breaking, double quotes embedded in arguments passed from PowerShell must be escaped as ""
  • Additionally, but only if setlocal enabledelayedexpansion is in effect in a given target batch file or if your computer is configured to use delayed expansion by default, for all batch files:
    • ! characters must be escaped as ^!, which, however, is only effective if cmd.exe considers the ! part of a double-quoted string.

It looks like we have a confluence of two problems:

  • A PowerShell problem with " chars. embedded in arguments passed to external programs:

    • In an ideal world, passing JSON text such as '{ "foo": "bar" }' to an external program would work as-is, but due to PowerShell's broken handling of embedded double quotes, that is not enough, and the " chars. must additionally be escaped, for the target program, either as \" (which most programs support), or, in the case of cmd.exe (see below), as "", which Python fortunately recognizes too: '{ ""foo"": ""bar"" }'
  • Limitations of argument-passing and escaping in cmd.exe batch files:

    • It sounds like spark-submit is an auxiliary batch file (.cmd or .bat) that passes the arguments through to a Python script.

    • The problem is that if you use \" for escaping embedded ", cmd.exe doesn't recognize them as escaped, which causes it to consider the & characters unquoted, and they are therefore interpreted as shell metacharacters, i.e. as characters with special syntactic function (command sequencing, in this case).

    • Additionally, and only if setlocal enabledelayedexpansion is in effect in a given batch file, any literal ! characters in arguments require additional handling:

      • If cmd.exe thinks the ! is part of an unquoted argument, you cannot escape ! at all.

      • Inside a quoted argument (which invariably means "..." in cmd.exe), you must escape a literal ! as ^!.

        • Note that this requirement is the inverse of how all other metacharacters must be escaped (which require ^ when unquoted, but not inside "...").

        • The unfortunate consequence is that you need to know the implementation details of the target batch file - whether it uses setlocal enabledelayedexpansion or not - in order to formulate your arguments properly.

        • The same applies if your computer is configured to use delayed expansion by default, for all batch files (and interactively), which is neither common nor advisable. To test if a given computer is configured that way, check the output from the following command for DelayedExpansion : 1: if there's no output at all, delayed expansion is OFF; if there's 1 or 2 outputs, delayed expansion is ON by default if the first or only output reports DelayedExpansion : 1.

 Get-ItemProperty -EA Ignore 'registry::HKEY_CURRENT_USER\Software\Microsoft\Command Processor', 'registry::HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor' DelayedExpansion

Workaround:

  • Since you're technically calling a batch file, use "" to escape literal " chars. inside your single-quoted ('...') PowerShell string.

  • If you know that the target batch file uses setlocal enabledelayedexpansion or if your computer is configured to use delayed expansion by default, escape ! characters as ^!

    • Note that this is only effective if cmd.exe considers the ! part of a double-quoted string.

Therefore (note that I've extended the URL to include a token with !, meant to be passed through literally as suffix more!):

$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'

If you need to escape an existing JSON string programmatically:

# Unescaped JSON string, which in an ideal world you'd be able
# to pass as-is.
$param = '{ "job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more!" }'

# Escape the " chars.
$param = $param -replace '"', '""'

# If needed, also escape the ! chars.
$param = $param -replace '!', '^!'

Ultimately, both problems should be fixed at the source - but that this is highly unlikely, because it would break backward compatibility.

With respect to PowerShell, this GitHub issue contains the backstory, technical details, a robust wrapper function to hide the problems, and discussions about how to fix the problem at least on an opt-in basis.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • @BrunoBernardes. (As it turns out, the twice-escaping would not have worked anyway with respect to `!`) There must be something else going on, and you need to determine the chain of calls, and where, specifically, the error occurs. The only thing that is certain is that it is `cmd.exe` that is complaining, i.e. that the error occurs in a batch file. – mklement0 May 07 '20 at 13:16
0

In this question Which characters need to be escaped when using Bash? , you will find all the characters that you should escape when passing them as normal characters in the shell, you will also notice that & is one of them.

Now I understand that if you tried to escape it, the JSON parser you are using will probably fail to parse the string. So one quick workaround would be to replace the & by any other special non-escapable symbol like @ or %, and do a step in your app where you replace it with & before parsing. Just make sure that the symbol you will use isn't used in your strings, and won't be used at any time.

Ahmed Hammad
  • 2,798
  • 4
  • 18
  • 35
  • thanks for the reply, but I send strings in this JSON that contains others special characters and I will needed to figure out replacement for every one of them – Bruno Bernardes May 06 '20 at 12:23
  • I manage to escape the ```&``` character using ```^```, so the final test string was ```$param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\"}'```, which I think is not that great, but pacience – Bruno Bernardes May 06 '20 at 12:58
  • Unfortunately, the question was originally mistagged as `bash`, which isn't involved, judging by the error messages. Instead, it is `cmd.exe` via an _auxiliary batch file_ that relays arguments to a Python script. So, unfortunately, there are _two_ shells involved: PowerShell, which has no problem with the `&`, given that it's inside a quoted string, and `cmd.exe`, which is where the problem manifests. Two general asides: If something is correctly escaped, the escape character is _removed_ during parsing; also, `spark-submit` is a third-party tool, so modifying it isn't really an option. – mklement0 May 06 '20 at 15:08