Questions tagged [gobblin]

Apache Gobblin is a distributed data integration framework. It simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

44 questions
3
votes
1 answer

Apache Nifi vs Gobblin

I am assessing a big-data project, we would need to pull lots of big data sets from various internet sources (ftp, api, etc), do light transformations and light data quality / sanity checking (eg: row and columnar inspections), and push it…
marc-dworkin
  • 336
  • 4
  • 15
3
votes
0 answers

what's the difference between apache gobblin and spring-cloud-dataflow, how to choose?

As the official documentation Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto…
user3172755
  • 137
  • 1
  • 10
2
votes
0 answers

Gobblin: Error: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart

I am tring to ingest data from kafka topic to hdfs following https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/ steps I'm following: start zookeeper $ zookeeper-server-start.bat…
Chhaya Vankhede
  • 316
  • 2
  • 14
2
votes
1 answer

I'm trying to install Apache Gobblin. How can I install it using Gradle?

I want to install Apache Gobblin on my MacOS X. For this, I downloaded version 0.14.0 and followed the steps here. Install Gobblin The first thing I did was this: tar -xvf incubator-gobblin-release-0.14.0.tar.gz and then: cd…
fatih
  • 1,285
  • 11
  • 27
2
votes
1 answer

Error while importing gobblin gradle project into IDE

I am getting this error while I try to import the gobblin distribution into my IDE , I have tried both inteliJ and eclipse , not able to find any luck. Below are the errors which I get when I try to import. In Eclipse the error…
2
votes
1 answer

Gobblin Kafka to HDFS: append to the same file

Is it any way to append new messages from Kafka to the same file in the HDFS using Gobblin? Now it creates a new file every time reading from Kafka. If run Gobblin job every minute for example,it will be the plenty of files. Please help!
Kateryna Khotkevych
  • 1,248
  • 1
  • 12
  • 22
2
votes
1 answer

Spark - Avro Reads Schema but DataFrame Empty

I am using Gobblin to periodically extract relational data from Oracle, convert it to avro and publish it to HDFS My dfs directory structure looks like this -tables | -t1 | -2016080712345 | -f1.avro | -2016070714345 | …
Brian
  • 7,098
  • 15
  • 56
  • 73
2
votes
0 answers

example job to import a table from local mysql to hdfs using gobblin

I have installed gobblin in cloudera VM. I want to run one example job to import a table from local MySQL to hdfs. Could any one help me with it? Thanks.
Nand Kishore
  • 101
  • 1
  • 7
1
vote
1 answer

HDFS look back configuration in Gobblin

I see Hive to hive data movement has a look back configuration in Gobblin where we can specify from which dates of the partitions we want to copy using gobblin.data.management.copy.hive.filter.LookbackPartitionFilterGenerator Is there a similar look…
1
vote
1 answer

How to limit the amount of files produced by apache gobblin's output?

I am currently using apache gobblin to read from a kafka topic. I went over the docs to check if there is a config to limit the amount of files produced by gobblin but couldnt find it. Is it possible to limit this? Thanks!
9uzman7
  • 409
  • 8
  • 19
1
vote
1 answer

Gradle Build issue : Facing issue while running gradle clean build for gobblin setup

While building gradle I am facing below issue. Caused by: org.gradle.api.plugins.UnknownPluginException: Plugin with id 'pegasus' not found. Can we setup gobblin in windows or not? If yes then which version of gobllin and gradle is suitable for…
1
vote
1 answer

Gobblin build failed with TaskExecutionException

I have clone the apache gobblin repo from master branch and followed the instructions mentioned here to build the code. Build is failing with TaskExecutionException for one of the tasks. It seems this task is failing because of…
cybertron
  • 23
  • 2
1
vote
1 answer

GobblinCli is not getting loaded while running cli commands

I am trying to set up a gobblin in my mac. when I am running cli run getting below error. Do we need to set up or configure anything before running gobblin cli commands? $ bin/gobblin.sh cli run ls:…
1stenjoydmoment
  • 229
  • 3
  • 14
1
vote
1 answer

Gobblin JSON to Avro convert failed with not a Json Array error

I'm new to Gobblin and trying to read JSON Kafka message and convert it to AVRO then store it in HDFS. My current job file is like a blow: job.name=GobblinKafkaQuickStart job.group=GobblinKafka job.description=Gobblin quick start job for…
GihanDB
  • 591
  • 2
  • 6
  • 23
1
vote
1 answer

Gobblin job metrics not publishing data to InfluxDB

I have configured .pull file to produce and send metrics to InfluxDb for source, extractor and converter jobs. I tried with the example wikipedia…
1
2 3