need to install some packages and binaries on the amazon EMR bootstrap action but I can't find any example that uses this.
Basically, I want to install python package, and specify each hadoop node to use this package for processing the items in s3 bucket, here's a sample frpm boto.
name='Image to grayscale using SimpleCV python package',
mapper='s3n://elasticmapreduce/samples/imageGrayScale.py',
reducer='aggregate',
input='s3n://elasticmapreduce/samples/input',
output='s3n://<my output bucket>/output'
I need to make it use the SimpleCV python package, but not sure where to specify this. What if it is not installed, how do I make it installed? Is there a way to avoid waiting for the installation to complete, is it possible to install it somewhere and just reference the python package?