What are the Benefits of Spring Cloud Dataflow?

Question

Based on what I've seen, creating a stream in Spring Cloud Dataflow (SCDF) will deploy the underlying applications, bind the communication service (like RabbitMQ), set the Spring Cloud Stream environment variables, and start the applications. This could all be done manually easily using a cf push command.

Meanwhile, I've been running into some drawbacks with Spring Cloud Dataflow:

SCDF Server is a memory hog on PCF (I have a stream with only 6 applications, and yet I'm needing about 10GB for the server)
No flexibility on application naming, memory, instances, etc. (All the things that you would typically set in the manifest.yml)
Integration with build tools (like Bamboo) are going to require extra work because we have to use the SCDF CLI rather than just the PCF CLI
Existing streams cannot be modified. To do a blue-green deployment, you have to deploy the application manually (binding the services and setting the environment variables manually). And then once a blue-green deployment is done, SCDF shows the stream as Failed, because it doesn't know that one of the underlying applications has changed.
Various errors I've run into, like MySQL Primary Key Constraint errors when trying to redeploy a failed stream

So what am I missing? Why would using Spring Cloud Dataflow be beneficial to just manually deploying the applications?

score 5 · Accepted Answer · answered Sep 30 '16 at 01:01

Based on what I've seen, creating a stream in Spring Cloud Dataflow (SCDF) will deploy the underlying applications, bind the communication service (like RabbitMQ), set the Spring Cloud Stream environment variables, and start the applications. This could all be done manually easily using a cf push command.

Yes - you can individually orchestrate stream applications and there are benefits to that. However, when you try to hand-wire each of the stream applications with the channelName, destination and the binding specific properties, you'd have to deal with more bookkeeping. This all becomes a behind-the-scene chore in Spring Cloud Data Flow's (SCDF) orchestration layer.

Especially, when you've "scaling" or "partitions" involved in your streaming pipeline, you'd have to pay attention to instanceCount, instanceIndex and the related properties. These are automated in SCDF through the DSL semantics, too.

SCDF Server is a memory hog on PCF (I have a stream with only 6 applications, and yet I'm needing about 10GB for the server)

Based on our experiments, this is typically observed when you're in "development" and repeatedly creating > deploying > destroying streams several times in a day. Generally speaking, the server should only require 1G.

There's a general consensus that the JVMs in PCF reporting memory that it isn't really using; this has to do something with java's rt.jar. There are some new kernel changes around 'memory usage reporting' functionality in PCF, so that after the JVM boots up (which uses a good deal of resources) it doesn't continue to report bad data. We are closely tracking this.

That said, we are also profiling the server to make sure there aren't any memory leaks. As-is, the server doesn't have any in-memory state - the minimal metadata state (eg: stream definitions) the server requires is persisted in an RDBMS. Please keep eye on #107 for developments.

No flexibility on application naming, memory, instances, etc. (All the things that you would typically set in the manifest.yml)

It is not clear what you mean by "application naming". If this has to deal with the server name, you can change it easily through your manifest.ymlor by other means. If it has to do with stream-app names, they are automatically deployed with "stream name" as the prefix, so it is easy to identify when you review the apps from CF CLI or Apps-Mgr.

As for the memory and disk usages, you can control at each application level through SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_MEMORY and SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_DISK tokens. More details here.

Integration with build tools (like Bamboo) are going to require extra work because we have to use the SCDF CLI rather than just the PCF CLI

You'd be running the CI builds on the stream/task applications, as they're are part of your development workflow. SCDF simply provides the orchestration mechanics to manage these applications. We are also working on native integration with Netflix's Spinnaker tooling to provide the out-of-the-box experience in near future.

Existing streams cannot be modified. To do a blue-green deployment, you have to deploy the application manually (binding the services and setting the environment variables manually). And then once a blue-green deployment is done, SCDF shows the stream as Failed, because it doesn't know that one of the underlying applications has changed.

You can perform blue-green like rolling upgrades on the apps individually. There's an active w-i-p to adapt to changing stream/task application state in SCDF, too. As an aside, Spinnaker integration would further simplify the rolling upgrades on custom application bits, and SCDF would adapt to dynamic changes - this is the end goal as far as this requirement goes.

Various errors I've run into, like MySQL Primary Key Constraint errors when trying to redeploy a failed stream

We would love to hear your feedback; specifically, please consider reporting these problems in the backlog. Any help on this regard is highly appreciated.

So what am I missing? Why would using Spring Cloud Dataflow be beneficial to just manually deploying the applications?

The architecture section covers the general capabilities. If you're to have numerous stream or task applications (like any other microservice setup), you'd need a central orchestration tooling to manage them in the cloud setting. SCDF provides DSL, REST-API, Dashboard, Flo and of course the security layer that comes out-of-the-box. Interoperability between streams and tasks is another important requirement for use-cases involving closed-loop analytics - there's DSL tooling around this. When Spinnaker integration becomes the first-class citizen, we foresee having an end-to-end continuous delivery over data pipelines. Lastly, the SCDF-tile for Cloud Foundry would interoperate with Spring Cloud Services to further automate the provisioning aspect along with comprehensive security coverage.

Hope this helps.

Sabby - thanks much for the response. So a few follow-up questions. 1 - I agree that it is beneifical to have a central orchestration tool, especially when you have a number of streams. But as soon as a blue-green deployment is done, the stream definitions in SCDF become outdated and show as failed. Other than getting into MySQL and manually changing the stream definitions, am I correct in saying that is currently no way to around this and this is a WIP right now? — , Sep 30 '16 at 14:43
2 - When I'm referring to application name, memory, etc., I'm referring to the underlying applications that are being created. SCDF uses a random name (that we have no control over) and I don't see any way of customizing the instances/memory for an application without changing the setting for every application deployed. Is there any way around this? — , Sep 30 '16 at 14:43
3 - You mention the security layer that comes out-of-the-box. Does SCDF provide any additional or different security versus manually deploying the underlying applications? — , Sep 30 '16 at 14:44
#1: Yes - we are working on it. Given the distributed nature of individual microservice applications, the centralized orchestration requires the right design to keep track of dynamic updates. The stream/task definition state should reflect the right status (_always_) and it is an important feature - we understand that. — Sabby Anandan, Sep 30 '16 at 17:14
#2: By default, we generate random routes to avoid the route conflicts in CF. However, you have the [option to turn it off](http://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/1.1.0.BUILD-SNAPSHOT/reference/htmlsingle/#getting-started-app-names-cloud-foundry) entirely, so you'll actually see the `stream_name + app_name` as the final application name. For memory/disk, you can control at each app level, please see [here](http://docs.spring.io/spring-cloud-dataflow-server-cloudfoundry/docs/1.1.0.BUILD-SNAPSHOT/reference/htmlsingle/#configuring-defaults). — Sabby Anandan, Sep 30 '16 at 17:19
#3: SCDF includes support for basic, file-based, oauth and ldap - more details [here](http://docs.spring.io/spring-cloud-dataflow/docs/1.1.0.M1/reference/html/getting-started-security.html). RBAC is coming up next, so you can define ACLs (at stream/task level) to declare who has access to do what. — Sabby Anandan, Sep 30 '16 at 17:25

What are the Benefits of Spring Cloud Dataflow?

1 Answers1

Linked