3

I am assessing a big-data project, we would need to pull lots of big data sets from various internet sources (ftp, api, etc), do light transformations and light data quality / sanity checking (eg: row and columnar inspections), and push it downstream. Immediate focus is batchy, but anticipate supporting streaming down the line. Ease of support at scale is an important requirement.

We are looking at Apache Nifi and Gobblin, which seem to overlap in intention. What sort of use cases fit which platform best? How would they conform to the use case above?

Thanks!

alex
  • 12,464
  • 3
  • 46
  • 67
marc-dworkin
  • 336
  • 4
  • 15
  • This question doesn't seem to be related to programming or programming tools directly. It might be better asked on stack exchange. https://stackoverflow.com/help/on-topic – adprocas Feb 27 '18 at 14:17
  • I suppose its a borderline question, but there are plenty of similar questions (eg: https://stackoverflow.com/questions/43231305/difference-between-apache-beam-and-apache-nifi). Note: I am not asking for which platform is "best" (aka: seeking opinions), just how each one conforms to a specific problem. I've edited the question to clarify that. – marc-dworkin Feb 27 '18 at 14:26
  • From what I can tell these are not programming tools. They obviously aide in some of the software development side of things, but it isn't a tool for programming. That other question would also be off topic for stackoverflow. I'm sure you can find many off topic questions. I don't think this is borderline based on what I have read about these products. – adprocas Feb 27 '18 at 14:30
  • 1
    Hi Adpro -- apache-nifi is a tag on stackoverflow (https://stackoverflow.com/tags/apache-nifi/info), which indicates community views it as on-topic for stackoverflow. – marc-dworkin Feb 27 '18 at 15:15
  • I'll remove my downvote because it does seem more grey than I was originally thinking. I still feel this is less a question about programming and more of a question about what the tools do. Obviously getting input from people using it is beneficial. Also, a lot of the questions for apache-nifi are specifically programming related, which is why that tag would be in there, and why I think this is more grey now. But, a question can be off topic while still relating to a tag. – adprocas Feb 27 '18 at 16:38
  • Thanks @adpro! I guess I don't see it as substantively different than asking whether reactjs or angular better fits a particular problem. – marc-dworkin Feb 27 '18 at 18:56
  • That's because angular and react are used for programming almost exclusively, so any questions comparing the two are going to be programming questions. It would be like asking a question about how to export data in Magento vs. Opencart, and what options exist for doing so. There are a lot of programming related questions for Magento, and exporting data is a grey area (programming might be involved). I just don't see a definite link to programming in your question is all. Does that make more sense? – adprocas Feb 27 '18 at 19:05
  • 3
    Most likely people who have NiFi experience do not have Goblin experience, and vice versa, so it is unlikely anyone can offer a comparison. It would better to describe specific use-cases and ask each community separately how they handle what you want to do, then you can compare the responses yourself. – Bryan Bende Feb 28 '18 at 14:29

1 Answers1

5

My experience is with NiFi, and I've just had a look at Gobblin, but mainly, NiFi is an application in itself, where Gobblin is a framework.

In NiFi, you'll have a GUI, with very granular authorizations, that allow, several users to intervene on different part of the flow, monitor it, etc ... One other thing is that NiFi is 'always on' and 'always in production' you are potentially able to make your modifications directly on the target, and as such, there are a few safeguards in order to avoid losing data (by mistake, I mean).

So, where I think both solutions can do more or less the same thing, if you have a workflow where you want to deploy once from time to time, Gobblin might be a better fit, but if you want something where you give some users permissions to intervene on parts of the flow directly in production, NiFi will be the best.

In the end, to keep the question oriented on programming:

  • NiFi allows to you program graphically, to give very granular permissions to your 'developers', and well as to update the 'program' (the NiFi flow) while it is running
  • Gobblin seems (from what little I've looked up) to work by defining jobs with text files, which seems to be more of a 'classical' development workflow, but that may fit better for your usage.
Romain Prévost
  • 513
  • 2
  • 12