4

I am attempting to plot data using Matplotlib within a jupyter notebook on an AWS-EMR instance. Matplotlib must be installed via a bootstrap action at instance start-up, which I have done successfully. I have also successfully installed Pandas in this way (and used it for various things in my notebook). The typical %matplotlib inline does not work. (In fact, it appears that NO magic commands work on AWS-EMR notebooks. I suspect AWS has disabled these or it has something to do with the notebooks being "serverless.")

I have tried:

  • installing matplotlib==2.0.2 and using magic command %matplotlib inline (as mentioned above, magic commands seem not to work at all)

  • installing matplotlib==1.5, installing ipympl, and using import ipympl. I don't receive errors using this method when trying to .show() a plot, but neither does a plot show up. The cell runs in the notebook with no plot appearing. As far as I can tell, .show() does nothing.

bdfoz
  • 41
  • 1
  • 3

2 Answers2

2

EDIT:

Check Parag Chaudhari's answer for plotting in spark mode.


Tested on release label:emr-5.21.0

There is no %matplotlib magic in spark magic kernel in EMR notebook. Type %%help to see all supported magics.

pyspark kernel send REST requests to remote EMR cluster via Livy. So plotting work on spark cluster doesn't make much sense. Do data processing using pyspark kernel in a distributed way and then perform plot work in %%local mode.

Try to start the cell with %%local and then run your code

%%local 
%matplotlib inline
<some code to plot charts>
Dev
  • 13,492
  • 19
  • 81
  • 174
2

Starting EMR 5.26 you can,

  1. Install additional Python libraries (e.g. Pandas, matplotlib, scipy, etc) on the EMR cluster from within the notebook. No need to use bootstrap actions or custom AMI. You can use newly added "list_packages", "install_pypi_package" and "uninstall_package" APIs in the Pyspark version of EMR notebooks. More information here. You can also refer this blog.

  2. Render and plot the graphs on the EMR cluster itself. You can use "%matplot" magic to achieve it.

Parag Chaudhari
  • 328
  • 2
  • 11