3

I have two lists of custom objects both are List<LogEntry>. The properties inside one are typeOfException, date and stackTrace and the other just contains typeOfException and stackTrace. What I would like to do is to remove duplicate Log entries based on their typeOfException and the stackTrace. The way I've defined a unique stack trace is to be if the first 'at line' is the same i.e

[25/05/21 10:28:41:481 BST] - IllegalStateException some text here
at com.google MyClass(Line 50)
[28/05/21 10:28:41:481 BST] - IllegalStateException some more text here
at com.google MyClass(Line 50)

are seen as duplicates but

[25/05/21 10:28:41:481 BST] - IllegalStateException some text here
at com.google MyClass(Line 50)
[28/05/21 10:28:41:481 BST] - IllegalStateException some more text here
at com.google MyClass(Line 50000)

would be seen as unique.

I have a List<LogEntry> called logEntries which contains the date, typeOfException and stackTrace. I have another List<LogEntry> called logEntriesToCheckForDupes which is a LogEntry object but this time just containing typeOfException and the top at line of the stackTrace (Note all the properties are Strings).

The code I have so far is

HashSet<Object> uniqueStackTraces =new HashSet<>();
    logEntryObjectsToCheckForDupes.removeIf(c -> !uniqueStackTraces.add(Arrays.asList(c.getTypeOfexception(), c.getStackTrace())));

which I think works (not entirely convinced as I go from 887 exceptions to only 14). Is there some method/logic to find the index of each unique entry. That way rather than creating a new HashSet I could just store a list of unique indexes and create a List<LogEntry> from logEntries of every object with a unique index?

I'm quite perplexed and not sure my code is actually working as intended so any suggestions/input is much appreciated. The question is similar to (Removing duplicates from the list of objects based on more than one property in java 8)and I used some logic from here.

  • Have you written any test cases, it could be your code is right. Perhaps loop though the list print out the object and using a text edit manually delete the duplicates and see what your answer is. Having taken a quick look the answers in the link question seem good to me. – Gavin Jun 08 '21 at 09:29
  • I haven't written any test cases yet, I spent a while looking through the actual Log File in notepad and it doesn't seem too unrealistic. I think seeing the initial list Size go to 14 from almost 900 was just a shock! I think mainly now I just need some way to find the index of these unique log entries so I can have my full log entry object with date/exception and stack trace. Or some other path to get these. – Connor Gill Jun 08 '21 at 09:32

1 Answers1

0

Group and aggregate:

public static void main(String[] args) {

    List<LogEntry> list1 = IntStream.range(0, 100).mapToObj(i -> random(true)).collect(toList());
    List<LogEntry> list2 = IntStream.range(0, 100).mapToObj(i -> random(false)).collect(toList());

    // join removing dups, get the last date
    Collection<LogEntry> result = Stream.concat(list1.stream(), list2.stream())
            .collect(toMap(
                    // the key (better use a Tuple<> type instead concatenate strings)
                    x -> x.typeOfException + ":" + x.stackTrace,
                    x -> x,
                    // the max non null date
                    (a, b) -> a.date == null ? b : b.date == null ? a : a.date < b.date ? b : a))
            .values();

    result.forEach(e -> System.out.printf("%s, %s, %d%n", e.typeOfException, e.stackTrace, e.date));
}

@AllArgsConstructor
static class LogEntry {
    public String typeOfException;
    public String stackTrace;
    public Integer date;

    public static LogEntry random(boolean withDates) {
        ThreadLocalRandom rnd = ThreadLocalRandom.current();
        return new LogEntry("E" + rnd.nextInt(3), "S" + rnd.nextInt(3), withDates ? rnd.nextInt() : null);
    }
}

with output

E2, S1, 1974693605
E1, S0, 2085047733
E2, S0, 1766963016
E0, S2, 2106321704
E0, S1, 1752799219
E1, S2, 2123681998
E1, S1, 1522756354
E0, S0, 1578552430
E2, S2, 1969494110

if we have few with date null appear

List<LogEntry> list1 = IntStream.range(0, 4).mapToObj(i -> random(true)).collect(toList());
List<LogEntry> list2 = IntStream.range(0, 100).mapToObj(i -> random(false)).collect(toList());

with output

E2, S1, null
E1, S0, null
E2, S0, null
E0, S2, 2123867824
E1, S2, null
E0, S1, 13858484
E2, S2, null
E1, S1, 1347419477
E0, S0, -135848900
josejuan
  • 9,338
  • 24
  • 31
  • This might be a silly question so apologies, list 1 and list 2 are going to be my two lists in this case aren't they? You've just used them as an example to show an output? – Connor Gill Jun 08 '21 at 11:37
  • Yes @ConnorGill , given your two lists the result you expect is on `result = ...`. – josejuan Jun 08 '21 at 14:22