0

I am writing a program to analyze some spreadsheet data. There are two columns: start time and duration (both variables are Doubles). The spreadsheet is not sorted. I need to sort the columns together by start time (that is, the durations have to stay with their matching start times). There are a few thousand rows, and analysis will happen periodically so I don't want to keep sorting the entire collection over and over again as more data gets added.

A Treemap using start time as the key and duration as the value seemed perfect because it would insert the information into the correct position as it gets read in, and keep the two pieces of data together as it goes.

And it did work perfectly for 90% of my data. Unfortunately I realized tonight that sometimes 2 events will have the same start time. Since the Treemap doesn't keep duplicate keys, I lose a row when the new data overwrites the old one.

There are many posts about this (see this and this and sort of this) and I see two suggestions keep coming up:

  1. a custom comparator to 'trick' the Treemap into allowing duplicates.
  2. using something like Treemap(Double,List(Double)) to store multiple values for a key.

The first suggestion is easiest for me to implement but I read comments that this breaks the contract of the Treemap and isn't a good idea. The second suggestion can be done but will make the analysis more complicated as I'll have to iterate through the list as I iterate through the keys, instead of simply iterating through the keys alone.

What I need is a way to keep two lists sorted together and allow duplicate entries. I'm hoping someone can suggest the best way to do this. Thanks so much for your help.

Community
  • 1
  • 1
Michael
  • 141
  • 2
  • 12
  • I'd recommend the second way, as iterating the key list will probably not add much complexity to your code and you can still rely on standard JDK classes. – Alex Aug 05 '15 at 06:50
  • A list of values (a class that holds both start and duration) that you sort before presenting it, does it really have to be sorted all the time? – Kennet Aug 05 '15 at 07:02
  • It's actually not spreadsheet data, it's data that's being added by a user in real time. It needs to be sorted before the analysis is done, and the user triggers the analysis whenever they want. While most of the time the data will be added in order (as the user scrolls forward through the dataset and selects individual events), it's possible for the user to scroll back and select older events. Rather than resorting every time the user wants to run analysis (and there could be 20,000 events recorded) it seemed smarter to put them in order as they're added. – Michael Aug 05 '15 at 23:50

0 Answers0