Apache Gobblin is a distributed data integration framework. It simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Questions tagged [gobblin]
44 questions
3
votes
1 answer
Apache Nifi vs Gobblin
I am assessing a big-data project, we would need to pull lots of big data sets from various internet sources (ftp, api, etc), do light transformations and light data quality / sanity checking (eg: row and columnar inspections), and push it…

marc-dworkin
- 336
- 4
- 15
3
votes
0 answers
what's the difference between apache gobblin and spring-cloud-dataflow, how to choose?
As the official documentation
Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto…

user3172755
- 137
- 1
- 10
2
votes
0 answers
Gobblin: Error: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart
I am tring to ingest data from kafka topic to hdfs following https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/
steps I'm following:
start zookeeper
$ zookeeper-server-start.bat…

Chhaya Vankhede
- 316
- 2
- 14
2
votes
1 answer
I'm trying to install Apache Gobblin. How can I install it using Gradle?
I want to install Apache Gobblin on my MacOS X. For this, I downloaded version 0.14.0 and followed the steps here.
Install Gobblin
The first thing I did was this:
tar -xvf incubator-gobblin-release-0.14.0.tar.gz
and then:
cd…

fatih
- 1,285
- 11
- 27
2
votes
1 answer
Error while importing gobblin gradle project into IDE
I am getting this error while I try to import the gobblin distribution into my IDE , I have tried both inteliJ and eclipse , not able to find any luck.
Below are the errors which I get when I try to import.
In Eclipse the error…

Sayyad Ghazi
- 21
- 1
2
votes
1 answer
Gobblin Kafka to HDFS: append to the same file
Is it any way to append new messages from Kafka to the same file in the HDFS using Gobblin? Now it creates a new file every time reading from Kafka. If run Gobblin job every minute for example,it will be the plenty of files.
Please help!

Kateryna Khotkevych
- 1,248
- 1
- 12
- 22
2
votes
1 answer
Spark - Avro Reads Schema but DataFrame Empty
I am using Gobblin to periodically extract relational data from Oracle, convert it to avro and publish it to HDFS
My dfs directory structure looks like this
-tables
|
-t1
|
-2016080712345
|
-f1.avro
|
-2016070714345
|
…

Brian
- 7,098
- 15
- 56
- 73
2
votes
0 answers
example job to import a table from local mysql to hdfs using gobblin
I have installed gobblin in cloudera VM.
I want to run one example job to import a table from local MySQL to hdfs.
Could any one help me with it?
Thanks.

Nand Kishore
- 101
- 1
- 7
1
vote
1 answer
HDFS look back configuration in Gobblin
I see Hive to hive data movement has a look back configuration in Gobblin where we can specify from which dates of the partitions we want to copy using
gobblin.data.management.copy.hive.filter.LookbackPartitionFilterGenerator
Is there a similar look…

Gayathri Yanamandra
- 215
- 1
- 2
- 7
1
vote
1 answer
How to limit the amount of files produced by apache gobblin's output?
I am currently using apache gobblin to read from a kafka topic. I went over the docs to check if there is a config to limit the amount of files produced by gobblin but couldnt find it.
Is it possible to limit this?
Thanks!

9uzman7
- 409
- 8
- 19
1
vote
1 answer
Gradle Build issue : Facing issue while running gradle clean build for gobblin setup
While building gradle I am facing below issue.
Caused by: org.gradle.api.plugins.UnknownPluginException: Plugin with id 'pegasus' not found.
Can we setup gobblin in windows or not? If yes then which version of gobllin and gradle is suitable for…

Rajesh dash
- 11
- 3
1
vote
1 answer
Gobblin build failed with TaskExecutionException
I have clone the apache gobblin repo from master branch and followed the instructions mentioned here to build the code.
Build is failing with TaskExecutionException for one of the tasks. It seems this task is failing because of…

cybertron
- 23
- 2
1
vote
1 answer
GobblinCli is not getting loaded while running cli commands
I am trying to set up a gobblin in my mac. when I am running cli run getting below error.
Do we need to set up or configure anything before running gobblin cli commands?
$ bin/gobblin.sh cli run
ls:…

1stenjoydmoment
- 229
- 3
- 14
1
vote
1 answer
Gobblin JSON to Avro convert failed with not a Json Array error
I'm new to Gobblin and trying to read JSON Kafka message and convert it to AVRO then store it in HDFS. My current job file is like a blow:
job.name=GobblinKafkaQuickStart
job.group=GobblinKafka
job.description=Gobblin quick start job for…

GihanDB
- 591
- 2
- 6
- 23
1
vote
1 answer
Gobblin job metrics not publishing data to InfluxDB
I have configured .pull file to produce and send metrics to InfluxDb for source, extractor and converter jobs. I tried with the example wikipedia…

Rahul Kalita
- 21
- 4