I am trying to process all records of a large file from s3 using python in batch of N no of line. i have to fetch N no of line per iteration. each line has some json object.
Here are some things I've already tried:
1) I tried the solution mentioned here Streaming in / chunking csv's from S3 to Python but it breaks my json structure while reading bytes of data.
2)
obj = s3.get_object(Bucket=bucket_name, Key=fname)
data=obj['Body'].read().decode('utf-8').splitlines()
It take more time to read large file with 100k lines. it will return list of lines which we can further iterate to get number of line from data variable.