I'm trying to optimize my memory usage when dealing with many (millions) of strings. I have a file of ~1.5 million lines that I'm reading through that contains decimal numbers in columns. For example, a file may look like
16916576.643 4 -12312674.246 4 39.785 4 16916584.123 3 -5937726.325 3
36.794 3
16399226.418 6 -4129008.232 6 43.280 6 16399225.374 4 -1891751.787 4
39.885 4
12415561.671 9 -33057782.339 9 52.412 9 12415567.518 8 -25595925.487 8
49.950 8
15523362.628 5 -12597312.619 5 40.579 5 15523369.553 5 -9739990.371 5
42.003 5
12369614.129 8 -28797729.913 8 50.068 8 0.000 0.000
0.000
....
Currently I'm using String.split("\\s+")
to separate these numbers, then calling Double.parseDouble()
on each one of the parts of the String[]
, which looks something like:
String[] data = line.split("\\s+");
double firstValue = Double.parseDouble(data[0]);
double secondValue = Double.parseDouble(data[1]);
double thirdValue = Double.parseDouble(data[2]);
This ends up creating a lot of String
objects. I also may have whitespace at the beginning or end of the line so I have to call trim()
on the line before I split it, which also creates another String
object. The garbage collector disposes of these String
objects, which results in slowdowns. Are there more memory efficient constructs in Java to do this? I was thinking about using a char[]
instead of a String
but I'm not sure whether there would be a substantial improvement from that.