I'm downloading csv files and processing the content by using Python 3.8.
I faced a memory error when downloading a large file, so, I need to download a certain amount of rows (let's say 10k rows), process and then read the next 10k rows until the entire csv is processed. So far, I read the entire csv and I decode it by converting it into a dictionary that preserves the headers and the values of each row:
data = s3.get_object(Bucket=config.BUCKET_NAME, Key=source_file)
contents = data['Body'].read().decode("utf-8")
csv_reader = csv.DictReader(contents.splitlines(True))
I've been reading documentation and download_fileobj can read an object in chunks and uses a callback method to process it, but the object is divided in bytes, and I need to divide it in rows to not split a row in the middle.
I prefer to not download the entire file into disk because I don't have a lot of space and that will require to delete the file after processing, so I prefer some way to do it directly in RAM, by using a library, method etc.
Ideas?