2

Given such data

UserID, MovieType , year 
1, 2, 2000
1, 3, 2000 
1, 2, 2006
2, 3, 2010
2, 4, 2011
2, 3, 2002
1, 2, 2010

What are the best option to store it in java , such that I can sort it according to first column , then second column then third ?

UserID, MovieType , year 
1, 2, 2000
1, 2, 2006
1, 2, 2010
1, 3, 2000 
2, 3, 2002
2, 3, 2010
2, 4, 2011

And then group them by user ID and Movietype

UserID, MovieType , movies seen per year  
1, 2, 3
1, 3, 1 
2, 3, 2
2, 4, 1
ekad
  • 14,436
  • 26
  • 44
  • 46
tnaser
  • 181
  • 2
  • 4
  • 11

2 Answers2

2

You should make a class that contains the three pieces of data. Then make an implementation of Comparator. So for example if in the data-containing class you have three getters such as int getUserId(), int getMovieType() and int getYear(). You can then store the data objects in a List and sort this list using your comparator together with Collections.sort(List<T> list, Comparator<T> comparator).

The comparator should do something like:

public int compare(DataObject data1, DataObject data2) {
   int comparison = data1.getUserId() - data2.getUserId();
   if (comparison == 0) {
       comparison = data1.getMovieType() - data2.getMovieType();
       if (comparison == 0) {
           comparison = data1.getYear() - data2.getYear();
       }
   }
   return comparison;
}
Fortunato
  • 1,360
  • 10
  • 9
  • Actually for the sorting you want to do there would be only one comparator which would compare on the three pieces on data, one at a time. See the edit in the answer... – Fortunato Jan 21 '12 at 21:56
1

For a very specific solution, you could have a Map<Integer, Map<Integer, Integer>>.

The first Map stores UserIDs to a map that stores MovieTypes to MoviesSeenPerYear.

If you use a TreeMap as the underlying types, everything will automatically be numerically sorted.

This will not be very flexible, though - for example, it would be difficult if you wanted to re-sort by MovieType instead of UserId.


In response to your comment:

You will have 2 main limitations:

  1. All of the Java collections classes are based on int sizes (same as the Java's array indexer), which have a maximum size of just under 2^31-1, or 2,147,483,647 - or just over 2 billion entries.
  2. Memory limitations of your JVM / machine.

If you're looking at working with this much data, and would like more flexible sorting requirements, you'd be well advised to use an actual database - either one of the standard ones, or even a JVM-embedded one like H2 or Apache Derby.

Community
  • 1
  • 1
ziesemer
  • 27,712
  • 8
  • 86
  • 94
  • I new in java and come across SortedSet set = new TreeSet(); map is better option ? – tnaser Jan 21 '12 at 23:47
  • @tnaser - A Set only stores items - without having keys/values to store associations. A Map has keys/values to store associations. If you want to use my solution, you need the TreeMap to store the key/value associations. However, both a TreeMap and a TreeSet allow a custom comparator to be provided as part of its constructor. You could actually use this with Fortunato's answer (+1 from me!). By using the Set instead of a List with this, you wouldn't need to call `Collections.sort` at all - the Set would automatically sort itself and keep itself sorted. – ziesemer Jan 22 '12 at 02:38
  • Is there any limitations on size of Set, Map , List ? for example millions of data ? – tnaser Jan 22 '12 at 04:42