I am processing a huge CSV (1GB) using java code.
My Application is Running on 2 Core Machine with 8GB memory.
I am using below command to start my application.
java -Xms4g -Xmx6g -cp $CLASSPATH JobSchedulerService
Applcation starts a thread to dwonload CSV from S3 and process it. Application works file for some time but OutOfMemoryError half way processing the file.
I am looking for a way where I can continue to process the CSV file and at the same time keep my memory usage low.
in CSV process I am performing following Steps:
//Step 1: Download FROM S3
String bucketName = env.getProperty(AWS_S3_BUCKET_NAME);
AmazonS3 s3Client = new AmazonS3Client(credentialsProvider);
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucketName, key));
InputStream inputSteam = s3object.getObjectContent(); //This Stream contains about 1GB of data
//Step 2: Parse CSV to Java
ObjectReader oReader = CSV_MAPPER.readerFor(InboundProcessing.class).with(CSV_SCHEMA);
try (FileOutputStream fos = new FileOutputStream(outputCSV, Boolean.FALSE)) {
SequenceWriter sequenceWriter = CsvUtils.getCsvObjectWriter(InboundProcessingDto.class).writeValues(fos);
MappingIterator<T> mi = oReader.readValues(inputStream)
while (mi.hasNextValue()) {
InboundProcessing inboundProcessing = mi.nextValue();
inboundProcessingRepository.save(inboundProcessing); // this is Spring JPA Entity Save operation. (Almost 3M records so 3M calls)
sequenceWriter.write(inboundProcessingDto); // this is writing to a CSV file on local file system which is uploaded to S3 in next Step
}
} catch (Exception e) {
throw new FBMException(e);
}