1

We've been having problems with our app where it runs out of memory when producing a CSV file. Specifically on big CSV files where there are more than 10k rows. We are using Spring Boot 2.0.8, and SuperCSV 2.4.0.

What would be the correct approach to handle these cases, so that our Spring MVC API does not crash due to OutOfMemoryException.

Would SuperCSV be the cause of this problem? I'd imagine it's not but just in case.

I have been reading about @Async, would it be a good idea to use it on this method as to open a separate thread?

Suppose I have the following method in a controller:

@RequestMapping(value = "/export", method = RequestMethod.GET)
public void downloadData(HttpServletRequest request,HttpServletResponse response) throws SQLException, ManualException, IOException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {

    List<?> data = null;
    data = dataFetchService.getData();

    ICsvBeanWriter csvWriter = new CsvBeanWriter(response.getWriter(), CsvPreference.STANDARD_PREFERENCE);

    //these next lines handle the header
    String[] header = getHeaders(data.get(0).getClass());
    String[] headerLocale = new String[header.length];
    for (int i = 0; i < header.length; i++)
        {
            headerLocale[i] = localeService.getLabel(this.language,header[i]);
        }

        //fix for excel not opening CSV files with ID in the first cell
        if(headerLocale[0].equals("ID")) {
            //adding a space before ID as ' ID' also helps
            headerLocale[0] = headerLocale[0].toLowerCase();
        }

    csvWriter.writeHeader(headerLocale);

    //the next lines handle the content
    for (Object line : data) {
        csvWriter.write(line, header);
    }

    csvWriter.close();
    response.getWriter().flush();
    response.getWriter().close();
}
Nimchip
  • 1,685
  • 7
  • 25
  • 50
  • The code 'data = dataFetchService.getData();' looks like to may consume a lot of memory. You need to get pages of data from the service and not just grab everything. What is the size of the list returned by dataFetchService.getData() ? – rjdkolb Mar 20 '19 at 06:43
  • you're correct, it's a `findAll()` from the repository, but how would you manage paging when you want all the data written in the csv? – Nimchip Mar 20 '19 at 14:08
  • you can return a cursor and resolve the entities on the fly? – pandaadb Mar 20 '19 at 15:45

2 Answers2

1

The code:

data = dataFetchService.getData();

looks like it may consume a lot of memory. This list may be millions of records in size. Or if many users export at the same time,this will result in memory issues.

Since the dataFetchService is backed by a Spring data repository you should get the amount of records it will return and then get data one Pagable at a time.

Example : If there are 20,000 rows in the table you should get 1000 rows of data at a time 20 times and slowly build up your CSV.

You should also request your data in some order or your CSV may end up in random order.

Look at implementing PagingAndSortingRepository on your repository

Example Application

Product.java

import javax.persistence.Entity;
import javax.persistence.Id;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Product {

    @Id
    private long id;
    private String name;
}

ProductRepository.java

import org.springframework.data.repository.PagingAndSortingRepository;

public interface ProductRepository extends PagingAndSortingRepository<Product, Integer> {
}

MyRest.java

import java.io.IOException;
import java.util.List;
import javax.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.domain.Sort;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;

@RestController
@RequiredArgsConstructor
public class MyRest {

    @Autowired
    private ProductRepository repo;

    private final int PAGESIZE = 1000;

    @RequestMapping("/")
    public String loadData() {
        for (int record = 0; record < 10_000; record += 1) {
            repo.save(new Product(record, "Product " + record));
        }
        return "Loaded Data";
    }

    @RequestMapping("/csv")
    public void downloadData(HttpServletResponse response) throws IOException {
        response.setContentType("text/csv");
        String[] header = {"id", "name"};
        ICsvBeanWriter csvWriter = new CsvBeanWriter(response.getWriter(), CsvPreference.STANDARD_PREFERENCE);

        csvWriter.writeHeader(header);

        long numberRecords = repo.count();
        for (int fromRecord = 0; fromRecord < numberRecords; fromRecord += PAGESIZE) {
            Pageable sortedByName = PageRequest.of(fromRecord, PAGESIZE, Sort.by("name"));
            Page<Product> pageData = repo.findAll(sortedByName);
            writeToCsv(header, csvWriter, pageData.getContent());
        }
        csvWriter.close();
        response.getWriter().flush();
        response.getWriter().close();
    }

    private void writeToCsv(String[] header, ICsvBeanWriter csvWriter, List<Product> pageData) throws IOException {
        for (Object line : pageData) {
            csvWriter.write(line, header);
        }
    }

}

1) Load data by calling

curl http://localhost:8080

2) Download CSV

curl http://localhost:8080/csv
rjdkolb
  • 10,377
  • 11
  • 69
  • 89
  • The problem i have with this is speed. Doing 100 rows per page might be too small, I would say. I will run a series of test and see if there's a notable improvement, but also with different page sizes. – Nimchip Mar 20 '19 at 18:28
  • May need to tune it to 250 rows or even 1000 rows , but I think you will be surprised by fetching chunks for the database as your code writes to a stream that the http client consumes. A user should not notice a difference. – rjdkolb Mar 20 '19 at 18:45
0

You should try fetching the data in chunks using setFetchSize which only brings limited rows at one time using cursors at the database end. This increases the network round trips but since I am streaming the download it does not matter much to the user as they continuously get the file. I am also using servlet 3.0 Async feature to free the container worker thread and give this task to another Spring managed thread pool. I am using this for Postgresql database and it works like charm. MySQL and Oracle jdbc drivers also support this. I am using raw JDBCTemplate for data-access and my custom resultset to csv converter plus on-the-fly zip converter. For using this on Spring data repository please check here.

Shailendra
  • 8,874
  • 2
  • 28
  • 37