Questions tagged [lambda-architecture]

Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.

This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate precomputed views, while simultaneously using real-time stream processing to provide dynamic views. The two view outputs may be joined before presentation.

34 questions
16
votes
3 answers

Lambda architecture - what is origin of this name?

I've read Manning's Big Data Lambda Architecture (http://www.manning.com/marz/BD_meap_ch01.pdf) and still not able to get why it's named 'Lambda'. Is it kinda code-name or name of system this architecture is based on?
setec
  • 15,506
  • 3
  • 36
  • 51
9
votes
0 answers

Which are the cons of a purely stream-based architecture against a Lambda architecture?

Disclaimer: I'm not a real-time architectures expert, I'd like only to throw a couple of personal considerations and evaluate what others would suggest or point out. Let's imagine we'd like to design a real-time analytics system. Following, Lambda…
8
votes
1 answer

Should I put my events inside a queue after getting them from Azure Event Hub?

I'm currently developing an application hosted on Azure that uses Azure Event Hub. Basically I'm sending messages (or should I say, events) to the Event Hub from a Web API, and I have two listeners: a Stream Analytics task for real-time analysis a…
ken2k
  • 48,145
  • 10
  • 116
  • 176
7
votes
1 answer

Kappa architecture: when insert to batch/analytic serving layer happens

As you know, Kappa architecture is some kind of simplification of Lambda architecture. Kappa doesn't need batch layer, instead speed layer have to guarantee computation precision and enough throughput (more parallelism/resources) on historical data…
VB_
  • 45,112
  • 42
  • 145
  • 293
7
votes
3 answers

Lambda Architecture with Apache Spark

I'm trying to implement a Lambda Architecture using the following tools: Apache Kafka to receive all the datapoints, Spark for batch processing (Big Data), Spark Streaming for real time (Fast Data) and Cassandra to store the results. Also, all the…
luis.alves
  • 335
  • 1
  • 3
  • 13
6
votes
1 answer

Real-time analysis of event logs with Elasticsearch

I'm gathering event logs every time a property of some device is changed. For this purpose I decided to use: Logstash - where my agent IoT application sends logs to in JSON format, Elasticsearch - for storing data (logs), Kibana - for data…
3
votes
1 answer

How does immutable data make eventual consistency trivial?

I have been reading Nathan Marz' article about how to beat the CAP theorem with the Lambda Architecture and don't understand how immutable data will make eventual consistency less complex. The following paragraph is taken from the article: The key…
3
votes
2 answers

What are the differences between kappa-architecture and lambda-architecture

If the Kappa-Architecture does analysis on stream directly instead of splitting the data into two streams, where is the datastored then, in a messagin-system like Kafka? or can it be in a database for recomputing? And is a seperate batch layer…
3
votes
1 answer

Hbase for real-time application

I want to build a real-time application for predictive maintenance. I thought about using Hbase with Phoenix. Phoenix provides SQL layer on HBase. I read Hbase is good for Big Data like 100 million rows plus++. But my Application Data has at the…
Khan
  • 1,418
  • 1
  • 25
  • 49
2
votes
0 answers

Can we create local Docker IoT containers for a SMACK-like environment with DC/OS and push them to our AWS VPC - if so, how?

In planning out a our Lambda architecture, for both real-time and batch processing, I see we will need several m3.xlarge instances (see CloudFormation SMACK stack template) using DC/OS. As not to incur too much cost for a POC, is there an approach…
ElHaix
  • 12,846
  • 27
  • 115
  • 203
2
votes
2 answers

How to make Spark restart the job automatically after finishing?

I am building a lambda architecture and need Spark as the batch part of it to restart itself either at regular intervals or right after finishing, or have the restart be called by a Spark Streaming job. I've looked at things and I probably don't…
SpooXter
  • 119
  • 1
  • 10
2
votes
2 answers

HBase or Cassandra?

In my lambda architecture, i am debating on whether to use HDFS or Cassandra to store my immutable data. I need Cassandra to serve the online requests etc. so it is the mandatory part of the tech stack. Now, I do not want to introduce new tool…
Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
1
vote
2 answers

Where can I run terraform?

This is mostly a research question as I can't seem to find out where I can run Terraform for my use case. I want to build a web front end which I can enter details for configuration, click a button and the front end would tell Terraform to build the…
1
vote
1 answer

Lambda architecture on AWS and API Gateway

I'm using Lambda Architecture. Batch & Speed layers are on AWS EMR. Serving Layer is on AWS ECS, simple and very thin REST server that aggregates Batch/Speed layers' views and return to client. Serving layers sits behind AWS ALB and AWS WAF. Ammend…
VB_
  • 45,112
  • 42
  • 145
  • 293
1
vote
1 answer

How to trigger airflow jobs based on flink streaming completion for partitions?

I have a flink streaming job which reads from Kafka and writes into appropriate partitions in file system. For instance, the job is configured to use a bucketing sink which writes to /data/date=${date}/hour=${hour}. How to detect that the partition…
1
2 3