0

Here is my problem : I have a list of custom object which contain a String called label. This list is big but too big, around 1000 objects. I would like to do an alphabetical sorting using label.

The thing is, some label contains character like for example É, (, e or E as first character. So I had to use the function deAccent() found here to sort it independently of the accent or other thing like that. With the use of this function the list ['Gab','eaaa','Éaa'] is sorted like that ['eaaa','Éaa','Gab'] instead of ['eaaa','Gab','Éaa']. Because when we use the compareTo method, É is after G. Here is what I have :

private List<Formula> sortFormulaList(List<Formula> formulaList) {
    // Sort all label alphabetically
    if (formulaList.size() > 0) {
        Collections.sort(formulaList, (formula1, formula2) ->
                deAccent(formula1.getLabel()).toLowerCase().compareTo(deAccent(formula2.getLabel().toLowerCase())));
    }
    return formulaList;
}

private String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

If I don't use the deAccent() it is fast enough for my purpose but when I use it take like 1 to 3 seconds to being sort.

Any idea on how I could make such a sort ? Or make this one faster

Yohann L.
  • 1,262
  • 13
  • 27
  • what are you trying to do in the deAccent method.Perhaps it can be optimized – Venkata Narayana Aug 29 '18 at 16:21
  • 4
    You can, for example, precompile the pattern instead of compiling it every invocation. – Glains Aug 29 '18 at 16:24
  • @VenkataNarayanaMalireddy I edited my post to make it more clear – Yohann L. Aug 29 '18 at 16:31
  • 3
    The comparator will be called O(n log n) times on average. Therefore if you preprocess the data such that deAccent is called only once per object you get a nice speedup. – Henry Aug 29 '18 at 16:31
  • Thanks @Glains, by doing a precompilation of the pattern it's faster, almost like without deAccent() – Yohann L. Aug 29 '18 at 16:35
  • What do you mean by preprocess the data @Henry ? – Yohann L. Aug 29 '18 at 16:37
  • @YohannLereclus store the de-accented string value with each object. – Henry Aug 29 '18 at 16:39
  • Oh okay I understand. Thanks a lot guys ! – Yohann L. Aug 29 '18 at 16:45
  • Within `Formula`, instead of adding `String getDeAccetendLabel()`, consider adding something like `static Comparator getLabelComparator()`. This will limit the exposure of the unaccented label class field. – Andrew S Aug 29 '18 at 17:10
  • I'm not sure I understood @AndrewS, I'm not really familiar with the `Comparator<>` I should define a `Comparator labelComparator` in my Formula object ? But how do I instantiate it ? – Yohann L. Aug 29 '18 at 17:40

1 Answers1

1

Consider @Henry's excellent suggestion and Formula might look like this:

public class Formula {
    private final String label;
    private final String deAccentedLabel;

    public Formula(String label) {
        this.label = label;
        this.deAccentedLabel = deAccent(label);
    }

    public String getLabel() {
        return label;
    }

    public String getDeAccentedLabel() {
        return comparableLabel;
    }


    private String deAccent(String str) {
        String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
        Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
        return pattern.matcher(nfdNormalizedString).replaceAll("");
    }

}

Then it can be used like this:

Collections.sort(formulaList, (formula1, formula2) -> formula1.getDeAccentedLabel().toLowerCase().compareTo(formula2.getDeAccentedLabel().toLowerCase());

However, this exposes deAccentedLabel by adding the public getDeAccentedLabel() method.

What I was suggesting in the comment is to hide deAccentedLabel to keep the public interface of Formula as clean as possible. So to sort, Formula provides the Comparator instead of other classes having to build it. Formula would look something like this:

public class Formula {
    private final String label;
    private final String comparableLabel;

    public Formula(String label) {
        this.label = label;
        this.comparableLabel = deAccent(label).toLowerCase();
    }

    public String getLabel() {
        return label;
    }

    private String deAccent(String str) {
        String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
        Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
        return pattern.matcher(nfdNormalizedString).replaceAll("");
    }

    public static Comparator<Formula> getLabelComparator() {
        return (formula1, formula2) -> formula1.comparableLabel.compareTo(formula2.comparableLabel);
    }

}

and used like this:

Collections.sort(formulaList, Formula.getLabelComparator());
Andrew S
  • 2,509
  • 1
  • 12
  • 14