Questions tagged [apache-samza]

Apache Samza is a distributed stream processing framework.

Apache Samza is a distributed stream processing framework.

It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

It has support for stateful stream processing natively.

Apache Samza is a top level project of the Apache Software Foundation.

82 questions
32
votes
3 answers

Where do Apache Samza and Apache Storm differ in their use cases?

I've stumbled upon this article that purports do contrast Samza with Storm, but it seems only to address implementation details. Where do these two distributed computation engines differ in their use cases? What kind of job is each tool good for?
Louis Thibault
  • 20,240
  • 25
  • 83
  • 152
11
votes
1 answer

Apache Storm vs Apache Samza vs Apache Spark

I have worked on Storm and Spark but Samza is quite new. I do not understand why Samza was introduced when Storm is already there for real time processing. Spark provides in memory near real time processing and has other very useful components as…
Amit Kumar
  • 2,685
  • 2
  • 37
  • 72
8
votes
1 answer

Kafka Producer TimeOutException

I am running a Samza stream job that is writing data to Kafka topic. Kafka is running a 3 node cluster. Samza job is deployed on yarn. We are seeing lot of these exceptions in container logs : INFO [2018-10-16 11:14:19,410]…
Anuj jain
  • 493
  • 1
  • 8
  • 26
8
votes
1 answer

Why does YARN job not transition to RUNNING state?

I've got a number of Samza jobs that I want to run. I can get the first to run ok. However, the second job seems to sit at the ACCEPTED state and never transitions into the RUNNING state until I kill the first job. Here is the view from the YARN…
John
  • 10,837
  • 17
  • 78
  • 141
8
votes
1 answer

Samza/Kafka Failed to Update Metadata

I am currently working on writing a Samza Script that will just take data from a Kafka topic and output the data to another Kafka topic. I have written a very basic StreamTask however upon execution I am running into an error. The error is…
Zerbraxi
  • 113
  • 2
  • 5
4
votes
1 answer

How to implement something similar to Storm DRPC in Samza?

I have samza job with a number of tasks, each of which holds some state in its embedded store. I want to expose this store for reading to outside world via some kind of RPC mechanism. What could be the best solution for this? Here is one paragraph…
Vladimir Lebedev
  • 1,207
  • 1
  • 11
  • 25
3
votes
1 answer

org.apache.beam.sdk.util.UserCodeException while executing Beam Pipeline using the Samza Runner

I am trying to run the Wordcount Demo from here with the Samza Runner. This is my build.gradle plugins { id 'eclipse' id 'java' id 'application' // 'shadow' allows us to embed all the dependencies into a fat jar. id…
Robert156
  • 41
  • 3
3
votes
1 answer

Can I sync/backup RocksDB over the network?

I have several machines processing large amounts of text data (100s of GB) that is indexed in RocksDB. The machines are for load balancing and are operating on the same data. When I add new machines, I want to copy the database over the network from…
cidermole
  • 5,662
  • 1
  • 15
  • 21
3
votes
2 answers

Apache Samza local storage - OrientDB / Neo4J graph instead of KV store

Apache Samza uses RocksDB as the storage engine for local storage. This allows for stateful stream processing and here's a very good overview. My use case: I have multiple streams of events that I wish to process taken from a system such as Apache…
John
  • 10,837
  • 17
  • 78
  • 141
3
votes
1 answer

How to query Samza KeyValueStore by key prefix?

Using the Samza KeyValueStore interface, how do I retrieve all documents with a common key prefix? The keys are Strings, and RocksDb will be the underlying store. Are there any issues with the approach below using the range…
Mike Buhot
  • 4,790
  • 20
  • 31
3
votes
1 answer

Designing a component both producer and consumer in Kafka

I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My…
2
votes
1 answer

Hello-Samza fails to compile

Executed mvn clean package as per the documentation on Hello-Samza. The build fails. [ERROR] Failed to execute goal on project hello-samza: Could not resolve dependencies for project org.apache.samza:hello-samza:jar:0.14.1-SNAPSHOT: Failed to…
user3853029
  • 61
  • 1
  • 3
2
votes
2 answers

Where does Samza on YARN place its KV state stores?

I need to find where Samza on YARN places its KV state stores. I suspect it is in the YARN local application directory as all YARN applications but I believe it is configurable as I did this a few months back (mapped folder to memory) in a different…
Edi Bice
  • 566
  • 6
  • 18
2
votes
2 answers

hello-samza demo not compiling

I am trying to follow the hello-samza basic setup and cannot get past "Build a Samza Job Package". As I am running off of the latest I try running gradle as specified: $ ./gradlew publishToMavenLocal FAILURE: Build failed with an exception. * What…
RockyMountainHigh
  • 2,871
  • 5
  • 34
  • 68
2
votes
1 answer

Samza task not receiving on one partition

I have a puzzling issue with one of my samza tasks. It works correctly except for messages on one partition. I have 9 partitions on the topic. If I send 1000 messages, I only receive about 890 of them. I have checked with kafka-console-consumer with…
jhericks
  • 5,833
  • 6
  • 40
  • 60
1
2 3 4 5 6