0

I have to read a large CSV file with about 700,000 records and compare the CSV data to an API response. I was able to use OpenCSV and make the code work. However, the deserialization process is extremely slow. It takes about an hour just to deserialize the data. I have been using the following code to read and deserialize my CSV.

     List<ProjectVO> csvValue = new CsvToBeanBuilder(new FileReader("project.csv"))
       .withType(ProjectVO.class).build().parse();

Is there any other efficient method to replace it?

My PersonVO class looks like this:

.
.
.
@JsonIgnoreProperties(ignoreUnknown = true)
public class ProjectVO {
@JsonProperty("actualCompletionDate")
@CsvBindByName(column = "actualCompletionDate")
private String actualCompletionDate;
.
.
.

I am comparing my CSV data and JSON response something like following:

assertEquals("The value for column 'actualCompletionDate' has the same data in both files for the ID: "
   + jsonValue.getId(), csvValue.getActualCompletionDate(), jsonValue.getActualCompletionDate());
ychaulagain
  • 73
  • 1
  • 8
  • You should read using a streaming parser than the entire collection (eg List) at once. – Anand Sowmithiran Dec 24 '22 at 04:42
  • What's the performance like when you do a "bare bones" experimental version with the absolute bare minimum to parse (ignoring all functional requirements, just to get a baseline benchmark)? – Mike Kim Dec 24 '22 at 04:42
  • 2
    Your first step should be profiling your code to figure out what's taking the time. – Loren Pechtel Dec 24 '22 at 04:42
  • Re Anand's suggestion, here's an example of what he's talking about https://stackoverflow.com/questions/39673372/read-streaming-data-from-csv-using-opencsv – Mike Kim Dec 24 '22 at 04:44
  • https://stackoverflow.com/questions/19486077/java-fastest-way-to-read-through-text-file-with-2-million-lines, @YAMM , anwser, have test read text file, `run: BufferedReader.readLine() into LinkedList, lines: 1000000, estimatedTime: 0.105118655`, maybe you can read file into LinkedList or ArrayList, then use multi thread read it from ArrayList and pass it to CsvToBeanBuilder – life888888 Dec 24 '22 at 05:47
  • @life888888 tried it but got an error. Do you have any link that I can take as a reference? – ychaulagain Dec 28 '22 at 03:16
  • @ychaulagain, your code is wrong, `List csvValue`, csvValue is List, you can not get `csvValue.getActualCompletionDate()`. – life888888 Dec 29 '22 at 03:31

0 Answers0