0

I am new to AWS and I don't understand the use of AWS Kinesis quite well. I know it is used for processing streaming data, but why not just use AWS Lambda to process the incoming data and possibly store them to DB eventually? As a real case scenario, imagine there is a web crawler that constantly crawls a website for specific activities and sends them in JSON format. Why should I use AWS Kinesis in this case when I can do all my data processing through a lambda function? Is it because Kinesis can handle large amounts of data through auto calling its shards? Also, after using Kinesis to store my data into S3 I realized that my data got stored into S3 bucket with almost 3-4 minutes of delay which is not acceptable. Can anyone help me, please?

  • Answered something similar yesterday: https://stackoverflow.com/a/65459029/2442804 . To get an actually helpful answer you need to provide a lot more context, what kind of data is coming in, at what frequency, etc. By the way: kinesis does **not** autoscale their shards, and it most certainly does not take a couple of minutes until an object is visible. – luk2302 Dec 28 '20 at 20:41
  • @luk2302 Thank you for your reply. I read the post you mentioned. My data is coming in in a high-frequency manner. I need to do processing on my data at 5 minute intervals. The problem that I am facing with Kinesis is that after each 5 minute, it takes some time for kinesis to deliver the data to be consumed by other services such as Lambda and this makes the data stale. Do you have any recommendations on that? – Michael Dec 28 '20 at 21:06
  • 1
    To get a good answer, you need to tell people the architecture of your system, and be very explicit about the time between events. For one thing, if you're using "Kinesis to store [your] data in S3", that probably means that you're using Kinesis _Firehose_, which is not the same as Kinesis _Data Streams._ – Parsifal Dec 28 '20 at 23:23

0 Answers0