I have a large dataset in CSV format that has the following structure:
time,value,id,value2,value3
2002141150250586,23.034,101,35.93,34.39
.
.
2002141150250586,24.349,2,24.45,67.99
Upon investigating the file, I found that the there is a batch of 100 datapoints for the same timestamp, in descending id order (from 101 to 2).
I was initially able to acquire the first 100 data using the following code:
import csv
import datetime
import itertools
def main():
with open('myfile.csv','r',encoding='utf-8-sig') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in itertools.isslice(csv_reader, 0, 100):
ID = row['id']
timestamp = datetime.datetime.strptime(row['time'], ""%y%m%d%H%M%S%f")
print(f'{ID}: {ts}')
which printed out the first 100 lines and was verifiable via the id
(from 101 to 2).
How do I keep on grabbing the subsequent batches (every 100 datapoints) till the EOF
since the file is very large?