0

I am working on refactoring a small portion of an open source large-scale configuration management system for my University.

We're using some open source tools for machine learning like Weka, and the aspect I am assigned to refactor is dealing with data mining and constructing rules.

The open source files we've been using from Liverpool and Japan are working well, but there are some memory usage issues when we use the program on large scale projects.

I've isolated the major memory hogs and come to the conclusion I need to figure out a different data structure to store and manipulate the data. As it stands now, the program is using what end up becoming very large multidimensional arrays of integers, objects, strings, etc.

There are several methods that simply reconfigure the set up of the associations after we are deriving rules for behaviors. In many cases, we are only adding or subtracting a single element, or simply flattening the multidimensional arrays.

I primarily program in C/C++ in general, so I am not an expert on the data structures available in Java. What I am looking to replace the static arrays with is a dynamic structure that can be easily resized without having to create a second multidimensional array.

What is happening now is we are having to create an entirely new structure every time we add and remove rules, objects, or other miscellaneous data from the multidimensional array. Then we are immediately copying into the new array.

I'd like to be able to simply use the same multidimensional array and simply add a new row and column. Subsequently, I'd like to be able to manipulate the data in the structure by simply saving a temporary value and overwriting previous values, shifting left, right, etc.

Can anyone think of any data structures in Java that would fit the bill?

On a related note, I have looked into explicit garbage collection, but have found I can only really suggest the JVM collect by calling System.Gc(), or by manipulating the garbage collection behavior of the JVM by way of tuning. Is there a better or more effective way?

Regards, Edm

edm
  • 9
  • 1

5 Answers5

1

If you have a lot of nulls/zeroes/falses/empty-strings in your matrix, then you can save space by using a sparse matrix implementation. Matrix-toolkits has several sparse matrices that you can use / modify to suit your needs, or you can just use a hashmap with an {x, y} tuple as the key. (The hashmap also has the advantage that there are several external hashmap implementations available, e.g. BerkeleyDB, so that it's unlikely that you'll run out of memory.)

Community
  • 1
  • 1
Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69
1

To replace static arrays with a dynamic structure use an ArrayList that grows with data automatically. To have a two-dimensional data structure use a List of List as

List<List<Integer>> dataStore = new ArrayList<List<Integer>>();
dataStore.add(new ArrayList<Integer>());
dataStore.add(Arrays.asList(1, 2, 3, 4));

// Access [1][3] as
System.out.println(dataStore.get(1).get(3)); // prints 4

Since, you touched upon having control over garbage collection (which Java actually does a pretty good job of all by itself) it seems memory management is of paramount importance as this is what's causing the re-factoring in the first place.

You could look into the Flyweight GoF pattern that focuses on sharing of objects instead of repeating them to cut down on the memory footprint of the application. To enable sharing flyweight objects need to be made immutable.

Psuedo code:

// adding a new flyweight obj at [2][1]
fwObjStore.get(2).set(1, FWObjFactory.getInstance(fwKey));

public class FWObjFactory {
    private static Map<String, FWObject> fwMap = new HashMap<String, FWObject>();

    public static getInstance(String fwKey) {
        if (!fwMap.containsKey(fwKey)) {
            fwMap.put(fwKey, newFwFromKey(fwKey));
        }
        return fwMap.get(fwKey);
    }

    private static FWObject newFwFromKey(String fwKey) {
        // ...
    }
}
Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
  • I agree, A list of lists will almost directly replace an array. If you want to be smart about it, encapsulate the lists into a Matrix class and add generics and call yourself happy. Or just go find one of the hundreds of implementations out there that already do just that. – Bill K Jun 26 '13 at 07:21
  • @BillK Yes, I think Apache Commons has `RealMatrix` implementations and Google's Guava has a generic `Optional` on the same lines. But, I did not have any experience with them to elaborate as part of my answer. I could add them as references though. Always smarter to not reinvent the wheel. – Ravi K Thapliyal Jun 26 '13 at 07:30
0

There's no multidimentional thing in Java.Java has array of arrays.

You can use ArrayList with type parameter as ArrayList

ArrayList<ArrayList<yourType>> myList = new ArrayList<ArrayList<yourType>>();

Also,don't worry about GC..It would collect as and when required..

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • Why would you refer to the Java SE 6 javadoc and not 7? Why do you use a variable of type ArrayList and not List? The "multidimensional thing" is.. well true. So the specification says too. Yet the specification keeps referring to "multidimensional arrays" throughout, and so do the developer community. The one major difference between Java and other languages who do have a "true" multidimensional array is that multidimensional arrays in Java need not have arrays of the same length at each level. – Martin Andersson Jun 26 '13 at 04:48
  • When you search for Javadocs it's pretty random if you will get 6 or 7 (or 5), they are all mostly identical so it rarely matters. – Bill K Jun 26 '13 at 07:15
0

I would look into using a "List of Lists". For example, you could declare something like

List<List<Object>> mArray = new ArrayList<List<Object>>();

Any time you need to add a new "row", you could do something like:

mArray.add (new ArrayList<Object>());

Check out the List interface to see what you can do with Lists in Java and which classes implement the interface (or roll your own!).

Keith
  • 518
  • 1
  • 6
  • 10
0

Why not use two Lists tangled together? Like so:

List<List<String>> rowColumns = new ArrayList<>();

// Add a row with two entries, or columns:
List<String> oneRow = Arrays.asList("Hello", "World!");
rowColumns.add(oneRow);

Also, consider using a Map with entries mapped to Lists.

Garbage Collection should generally never have to be dealt with explicitly in Java. Usually you want to look for memory leaks whenever one occur first. When that happens, look for background threads that don't die as supposed to or strong references in caches. If you want to read some about the latter issue, you can start here and here.

Martin Andersson
  • 18,072
  • 9
  • 87
  • 115