Use PredictionIO in production

Question

I have installed PredictionIO locally, trained the engine using the Universal Recommendation template that I modified for my needs, everything looks fine.

Now that I know that this could fit my needs, I desire to deploy it to production, unfortunately, there is not much documentation about it.

Ideally, I would like to have everything deployed on AWS, there is a part of the documentation describing it, but useless since the CloudFormation template is disabled.

I was thinking about using maybe using Docker to achieve it, but I lack of knowledge about the whole stack and would like to understand the following:

where should the data be stored? hbase seems to be the "database", isn't it dangerous to have it on the same server as the rest (event server, prediction server)?
how does it scale? do I need multiple instances of PredictionIO running behind a load balancer or is one enough? if so, how to achieve that?
what is a good distributed architecture? in order to scale, I'm pretty sure we will need to separate the EventServer from the PredictionServer, what is the good way to do this?

Hope someone can help. Thanks. Cyril

Kobynet · Answer 1 · 2016-11-20T13:04:48.987

Where should the data be stored?

According to PredictionIO website:

If you decide to install HBase to another location, you must edit PredictionIO-0.10.0-incubating/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_HBASE_HOME variable to point to your own HBase installation.

Also mentioned in that webpage

For production deployment, run a fully distributed HBase configuration.

How does it scale?

There is a great answer at predition-io google group where they break down scaling into seperate parts

What is a good distributed architecture?

in order to scale, I'm pretty sure we will need to separate the EventServer from the PredictionServer, what is the good way to do this?

Seperating ingesting layer, processing layer and serving layer is in general considered good practice, but you do need to pay attention not to over-engineer. It very much depends on your specific use-cases, Don't forget that each seperation you make adds more complications to the system(deploying, monitoring etc.).

Use PredictionIO in production

1 Answers1