Questions tagged [flume]

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

1136 questions
22
votes
7 answers

What's the difference between Flume and Sqoop?

Both Flume and Sqoop are meant for data movement, then what is the difference between them? Under what condition should I use Flume or Sqoop?
Cacheing
  • 3,431
  • 20
  • 46
  • 65
19
votes
1 answer

flume vs kafka vs others

May be this question has been asked before but I think it is good to consider it again today given that these technologies have matured. We're looking to use one of flume, kafka, scribe, or others to store streaming facebook and twitter profile…
pranavsharma
  • 1,085
  • 2
  • 10
  • 18
18
votes
3 answers

What is the most mature library for building a Data Analytics Pipeline in Java/Scala for Hadoop?

I found many options recently, and interesting in their comparisons primarely by maturity and stability. Crunch - https://github.com/cloudera/crunch Scrunch - https://github.com/cloudera/crunch/tree/master/scrunch Cascading -…
yura
  • 14,489
  • 21
  • 77
  • 126
18
votes
5 answers

Rebalancing issue while reading messages in Kafka

I am trying to read messages on Kafka topic, but I am unable to read it. The process gets killed after sometime, without reading any messages. Here is the rebalancing error which I get: [2014-03-21 10:10:53,215] ERROR Error processing message,…
divinedragon
  • 5,105
  • 13
  • 50
  • 97
14
votes
2 answers

how to efficiently move data from Kafka to an Impala table?

Here are the steps to the current process: Flafka writes logs to a 'landing zone' on HDFS. A job, scheduled by Oozie, copies complete files from the landing zone to a staging area. The staging data is 'schema-ified' by a Hive table that uses the…
Alex Woolford
  • 4,433
  • 11
  • 47
  • 80
13
votes
6 answers

failing to load log4j2 while running fatjar

i am working on a project where i utilize log4j2 logging. while developing in intellij, all works fine and the logging is done as expected. the log4j2.xml is linked through java property passed to jvm on startup via intellij settings. but once i…
atarno
  • 329
  • 1
  • 3
  • 14
12
votes
7 answers

JMeter - Could not find the TestPlan class

I have a simple flume setup with a HTTP souce and a sink that writes the POST request payload to a file. (This complete setup is on a Linux machine). After that my task is to do a performance test on ths setup. So I decided to use JMeter (this is…
Himanshu
  • 1,433
  • 4
  • 24
  • 35
12
votes
3 answers

How to setup a HTTP Source for testing Flume setup?

I am a newbie to Flume and Hadoop. We are developing a BI module where we can store all the logs from different servers in HDFS. For this I am using Flume. I just started trying it out. Succesfully created a node but now I am willing to setup a HTTP…
Himanshu
  • 1,433
  • 4
  • 24
  • 35
11
votes
1 answer

How to configure Flume to listen a web api http petitions

I have built an api web application, which is published on IIS Server, I am trying to configure Apache Flume to listen that web api and to save the response of http petitions in HDFS, this is the post method that I need to listen: [HttpPost] …
MelgoV
  • 661
  • 8
  • 21
11
votes
2 answers

Apache Flume vs Apache Flink difference

I need to read a stream of data from some source (in my case it's UDP stream, but it shouldn't matter), transform the each record and write it to the HDFS. Is there any difference between using Flume or Flink for this purpose? I know I can use…
Kateryna Khotkevych
  • 1,248
  • 1
  • 12
  • 22
11
votes
0 answers

Scribe, Flume and Chukwa - what are alternatives?

I would like to learn about alternatives to those projects, especially designed to aggregate data from logs from multiple nodes (>500) and designed for low memory/cpu usage. I'm familiar with scribe, flume and chukwa and I think that they use too…
wlk
  • 5,695
  • 6
  • 54
  • 72
9
votes
3 answers

real time log processing using apache spark streaming

I want to create a system where I can read logs in real time, and use apache spark to process it. I am confused if I should use something like kafka or flume to pass the logs to spark stream or should I pass the logs using sockets. I have gone…
Y0gesh Gupta
  • 2,184
  • 5
  • 40
  • 56
9
votes
2 answers

Transferring files from remote node to HDFS with Flume

I have a bunch of binary files compressed into *gz format. These are generated on a remote node and must be transferred to HDFS located one of the datacenter's server. I'm exploring the option of sending the files with Flume; I explore the option…
9
votes
5 answers

How to install and configure apache flume?

Am new in the Apache Flume. I need to install the flume on top of the HDFS cluster environment. I did Google it, all are saying using the cloudera distribution but I need to install and configure from the source. So can anyone please suggest me,…
venkat
  • 513
  • 2
  • 10
  • 16
8
votes
4 answers

Retrieving timestamp from hbase row

Using Hbase API (Get/Put) or HBQL API, is it possible to retrieve timestamp of a particular column?
Abhijeet Pathak
  • 1,948
  • 3
  • 20
  • 28
1
2 3
75 76