I am trying to deploy a spark job (using pyspark librairies : ML) on aws EMR. I want to create a simple cluster with a single instance, to understand how EMR works.
I create the cluster with the console with the following configuration :
spark-submit --deploy-mode cluster s3://bucket/key/file.py
My step fails with a bunch of error logs that I struggle to understand besides this on :
File "PowerProdPredictionEmr.py", line 261
df = df.select("Perimetre", *target_exprs, *window_exprs, "rn")
SyntaxError: invalid syntax
Which I don't understand since it's working locally on my machine.
Here is the code :
...
window_exprs = [df.power_prod[i] for i in range(w*sample_week)]
df = df.select("Perimetre", *target_exprs, *window_exprs, "rn")
...
Any idea ? I can add other log files if necessary.