4

I'm currently looking through two very large lists of Peak Objects, by overriding the equals method and looping through the two lists, comparing every peak to every other peak. Is there a more efficient way of doing this? My lists can be ~10,000 elements, which means up to 10000 * 10000 comparisons.

The code for my peak object:

public class Peak extends Object{

private final SimpleIntegerProperty peakStart;
private final SimpleIntegerProperty peakEnd;
private final SimpleIntegerProperty peakMaxima;
private final SimpleIntegerProperty peakHeight;
private final SimpleIntegerProperty peakWidth;
private final SimpleStringProperty rname;

public Peak(int peakStart, int peakEnd, int peakMaxima, int peakHeight, String rname) {
    this.peakStart = new SimpleIntegerProperty(peakStart);
    this.peakEnd = new SimpleIntegerProperty(peakEnd);
    this.peakMaxima = new SimpleIntegerProperty(peakMaxima);
    this.peakHeight = new SimpleIntegerProperty(peakHeight);
    this.peakWidth = new SimpleIntegerProperty(peakEnd - peakStart);
    this.rname = new SimpleStringProperty(rname);
}

public String getRname() {
    return rname.get();
}

public SimpleStringProperty rnameProperty() {
    return rname;
}

public int getPeakWidth() {
    return peakWidth.get();
}

public int getPeakHeight() {
    return peakHeight.get();
}

public int getPeakStart() {
    return peakStart.get();
}

public int getPeakEnd() {
    return peakEnd.get();
}

public int getPeakMaxima() {
    return peakMaxima.get();
}

@Override
public String toString() {
    return "Peak{" +
            "peakStart= " + peakStart.get() +
            ", peakEnd= " + peakEnd.get() +
            ", peakHeight= " + peakHeight.get() +
            ", rname= " + rname.get() +
            '}';
}

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    Peak peak = (Peak) o;

    if (!peakMaxima.equals(peak.peakMaxima)) return false;
    return rname.equals(peak.rname);
}

@Override
public int hashCode() {
    int result = peakMaxima.hashCode();
    result = 31 * result + rname.hashCode();
    return result;
}
}

And my loop for comparing the objects is here.

 List<Peak> interestingPeaks = new ArrayList<>();

            if(peakListOne != null && peakListTwo != null){
                for(Peak peak : peakListOne){
                    for(Peak peak2 : peakListTwo){
                        if(peak.equals(peak2)){ //number one, check the rnames match
                            if((peak2.getPeakHeight() / peak.getPeakHeight() >= 9) || (peak.getPeakHeight() / peak2.getPeakHeight() >= 9)){
                                    interestingPeaks.add(peak);
                            }
                        }
                    }
                }
            }

            return interestingPeaks;

The code is basically matching the position of the maxima, and the rname , which is just a String. Then appending the peak to the interestingPeaks list if the height of one is a factor of 9x larger than the other.

Sam
  • 1,234
  • 3
  • 17
  • 32
  • Did you try using hashes? – flx Apr 16 '18 at 11:24
  • If you sort your lists, you can compare a lot smarter. – Peter Bruins Apr 16 '18 at 11:26
  • There are conditions that we don't know about that could make your algorithm simpler or more difficult. Can the lists be sorted or does the order matter? Are peak repetitions allowed within a single list (maybe it could be a HashSet)? There are many algorithms that could help you there but you have to see if the possible approaches match your particular case. – mingos Apr 16 '18 at 11:32
  • The lists are not sorted on input, but there's no reason why they couldn't be sorted as part of the method. – Sam Apr 16 '18 at 11:35

1 Answers1

4

Appreciate that if the two lists were sorted by maxima and name, you could simply make a single linear pass down both lists, and compare items side by side. If the two lists were in fact completely equal, then you would never find a pair from the two lists which were not equal.

List<Peak> p1;
List<Peak> p2;

p1.sort((p1, p2) -> {
    int comp = Integer.compare(p1.getPeakMaxima(), p2.getPeakMaxima());
    return comp != 0 ? comp : p1.getRname().compareTo(p2.getRname());
});

// and also sort the second list

Now we can just walk down both lists and check for a comparison failure:

for (int i=0; i < p1.size(); ++i) {
    if (!p1.get(i).equals(p2.get(i))) {
        System.out.println("peaks are not equal");
        break;
    }
}

This reduces an O(N^2) operation to one which is O(N*lgN), which is the penalty for doing both sorts (the final walk down the list is O(N), and would be negligible with either approach).

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • O(n^2) reduced to O(n). That's quite an optimisation - if and only if the lists are allowed to be reordered. The OP never mentioned that though, so while a good answer, we can only hope it'll be of use ;) – mingos Apr 16 '18 at 11:35
  • @mingos I answered incompletely. I am updating now. – Tim Biegeleisen Apr 16 '18 at 11:35
  • I can absolutely sort the lists. This will work perfectly. thanks. – Sam Apr 16 '18 at 11:36
  • @Sam Did you want to flag every non matching peak, or did you just want to assert that all the peaks either are, or are not, the same? If the latter, you are done and my answer should work. It the former, my answer is a good start, but we need to do more work. – Tim Biegeleisen Apr 16 '18 at 11:37
  • @TimBiegeleisen I only need to know where the peakMaximas and the rnames match, so your example will definitely work. Thanks again. – Sam Apr 16 '18 at 11:39
  • I am not sure to understand. This answer assumes that the two lists have to contain exactly the same element to perform the matching. In the OP code I don't see this requirement. Besides this also assumes that the lists have the same size. I would supposed that we should perform a nested loop as in the original code and that the break should have performed in the inner loop. – davidxxx Apr 16 '18 at 11:53