2

How do I use a variable defined in the EMR cluster's Python instance when I run code on the managed Jupyter notebook instance using %%local?

Specifically I want to use matplotlib as shown in this question, and display plot from a dataframe generated using spark.sql(). Using %%sql lets me easily use data results in %%local, but I would still need to pass parameters to %%sql from the EMR Python instance

Example:

ln[1]: parameter = 'Hello parameter'

ln[2]: %%local
       print(parameter)

I keep getting error that my variable is not defined.

1 Answers1

1

I found 2 workarounds

  • Use %%spark -o df to return SQL query results to a dataframe that can be used with %%local like in this answer
  • Do all query building, execution and any data processing like normal without using any %% magic commands, then write the final data to a temporary table in my database using df.createOrReplaceTempView("temp_table_name"). Then use a simple query to retrieve the final data with %%sql -q -o df and SELECT * FROM temp_table_name