I want to parse from the topic in Kafka tool data, which inserted into the topic yesterday. The topic contains more than 600bln data. And I need only new data. I have a parser that focused on timestamp, and if timestamp more than today's data it parses it. But it takes a lot of time. For example
for msg in consumer:
s = msg[3]
dt_object = datetime.datetime.fromtimestamp(s/1000)
date1 = dt_object.strftime('%Y-%m-%d %H:%M:%S')
date = datetime.datetime.strptime('2020-06-04 00:00:00', '%Y-%m-%d %H:%M:%S')
if dt_object > date:
print(date1)
num_rows = num_rows + 1
m = json.loads(msg.value)