1

I want to convert a big csv file like 20000 to 50000 record file into json array but it takes nearly 1 min to convert in is there any way to achieve it in less then 5 sec.

  ResourceBundle rb = ResourceBundle.getBundle("settings");
        String path = rb.getString("fileandfolder.Path");
        System.out.println(path + "ssdd");
        String csvPath = request.getParameter("DP") != null ? request
                .getParameter("DP").toString() : "";

        String orname = path + csvPath;
        File file = new File(orname);

        FileReader fin = new FileReader(file); //Read file one by one
        BufferedReader bi = new BufferedReader(fin);

        int res;
        String csv = "";

        while ((res = fin.read()) != -1) {
            csv = csv + ((char) res); //Converted int to char and stored in csv

        }
        long start3 = System.nanoTime();
        JSONArray array = CDL.toJSONArray(csv);
        String Csvs = array.toString();

        long time3 = System.nanoTime() - start3;
        System.out
                .printf("Took %.3f seconds to convert to a %d MB file, rate: %.1f MB/s%n",
                        time3 / 1e9, file.length() >> 20, file.length()
                                * 1000.0 / time3);
James Z
  • 12,209
  • 10
  • 24
  • 44
Mayur
  • 1,123
  • 3
  • 22
  • 38

3 Answers3

1

Try

StringBuilder sb = new StringBuilder();
while ((res = fin.read()) != -1) {
    sb.append((char) res); //Converted int to char and stored in csv
}
String csv = sb.toString();

Concating strings using + is slow, you should use StringBuilfer or StringBuffer

Unlink
  • 973
  • 1
  • 7
  • 14
  • He isn't timing reading in the data, he is only timing the conversion of the string to the array – mikea Jun 19 '14 at 12:05
  • That still won't help, the time he's checking is the process of taking the `String` into a `JSONArray`. – Jonathan Drapeau Jun 19 '14 at 12:06
  • 3
    @mikea before you down vote mine too, note OP is using `start2` for the timing, not `start3` – weston Jun 19 '14 at 12:07
  • @JonathanDrapeau Not true, `long start3 = System.nanoTime();` is not used in time calculation. – weston Jun 19 '14 at 12:10
  • @JonathanDrapeau unless he removed a line, I know it's above that loop for reading the file one character at a time. – weston Jun 19 '14 at 13:18
1

There are two glaring performance problems in your code, both of them in this snippet:

    while ((res = fin.read()) != -1) {
        csv = csv + ((char) res);
    }

First problem: fin is an unbuffered FileReader, so each read() call is actually doing a system call. Each syscall is hundreds or even thousands of instructions. And you are doing that for each and every character in the input file.

Remedy: Read from bi rather than from fin. (That's what you created it for ... presumably.)

Second problem: each time you execute csv = csv + ((char) res); you are creating a new String that is one character longer than the previous one. If you have N characters in your input file, you end up copying roughly N^2 characters to build the string.

Remedy: Instead of concatenating Strings, use a StringBuilder ... like this:

    StringBuilder sb = new StringBuilder();
    ....
        sb.append((char) res);
    ....
    String csv = sb.toString();

At this point, it is not clear to me if there is also a performance problem in converting the csv string to JSON; i.e. in this snippet.

    JSONArray array = CDL.toJSONArray(csv);
    String Csvs = array.toString();

Unfortunately, we don't know what JSONArray and CDL classes you are actually using. Hence, it is difficult to say why they are slow, or whether there is a faster way to do the conversion. (But I suspect, that the biggest performance problems are in the earlier snippet.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Unless we find out where `start2` is initialized, that might not be the problem. – Jonathan Drapeau Jun 19 '14 at 12:12
  • 1
    Yea ... the timing code is rather mucked up too. But I am prepared to accept that the OP's code really does take *minutes* to process a large file. The performance problems are real. – Stephen C Jun 19 '14 at 12:17
-1

This csv = csv + ((char) res) is very slow, you are reading one char at a time, then allocating a new string with the old string and the new char.

To load all text from a file into a string do this:

static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

(from https://stackoverflow.com/a/326440/360211, note there is a cleaner way if using java 7)

Use like this instead of loop:

String csv = readFile(orname, StandardCharsets.UTF_8);
Community
  • 1
  • 1
weston
  • 54,145
  • 21
  • 145
  • 203