I am working on refactoring a small portion of an open source large-scale configuration management system for my University.
We're using some open source tools for machine learning like Weka, and the aspect I am assigned to refactor is dealing with data mining and constructing rules.
The open source files we've been using from Liverpool and Japan are working well, but there are some memory usage issues when we use the program on large scale projects.
I've isolated the major memory hogs and come to the conclusion I need to figure out a different data structure to store and manipulate the data. As it stands now, the program is using what end up becoming very large multidimensional arrays of integers, objects, strings, etc.
There are several methods that simply reconfigure the set up of the associations after we are deriving rules for behaviors. In many cases, we are only adding or subtracting a single element, or simply flattening the multidimensional arrays.
I primarily program in C/C++ in general, so I am not an expert on the data structures available in Java. What I am looking to replace the static arrays with is a dynamic structure that can be easily resized without having to create a second multidimensional array.
What is happening now is we are having to create an entirely new structure every time we add and remove rules, objects, or other miscellaneous data from the multidimensional array. Then we are immediately copying into the new array.
I'd like to be able to simply use the same multidimensional array and simply add a new row and column. Subsequently, I'd like to be able to manipulate the data in the structure by simply saving a temporary value and overwriting previous values, shifting left, right, etc.
Can anyone think of any data structures in Java that would fit the bill?
On a related note, I have looked into explicit garbage collection, but have found I can only really suggest the JVM collect by calling System.Gc(), or by manipulating the garbage collection behavior of the JVM by way of tuning. Is there a better or more effective way?
Regards, Edm