I am reading data from an excel file using apache poi and transforming it into a list of object. But now I want to extract any duplicates based on certain rules into another list of that object and also get the non-duplicate list.
Condition to check for a duplicate
- name
- phone number
- gst number
Any of these properties can result in a duplicate. which mean or not an and
Party Class
public class Party {
private String name;
private Long number;
private String email;
private String address;
private BigDecimal openingBalance;
private LocalDateTime openingDate;
private String gstNumber;
// Getter Setter Skipped
}
Let's say this is the list returned by the logic to excel data so far
var firstParty = new Party();
firstParty.setName("Valid Party");
firstParty.setAddress("Valid");
firstParty.setEmail("Valid");
firstParty.setGstNumber("Valid");
firstParty.setNumber(1234567890L);
firstParty.setOpeningBalance(BigDecimal.ZERO);
firstParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var secondParty = new Party();
secondParty.setName("Valid Party");
secondParty.setAddress("Valid Address");
secondParty.setEmail("Valid Email");
secondParty.setGstNumber("Valid GST");
secondParty.setNumber(7593612247L);
secondParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var thirdParty = new Party();
thirdParty.setName("Valid Party 1");
thirdParty.setAddress("address");
thirdParty.setEmail("email");
thirdParty.setGstNumber("gst");
thirdParty.setNumber(7593612888L);
thirdParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var validParties = List.of(firstParty, secondParty, thirdParty);
What I have attempted so far :-
var partyNameOccurrenceMap = validParties.parallelStream()
.map(Party::getName)
.collect(Collectors.groupingBy(Function.identity(), HashMap::new, Collectors.counting()));
var partyNameOccurrenceMapCopy = SerializationUtils.clone(partyNameOccurrenceMap);
var duplicateParties = validParties.stream()
.filter(party-> {
var occurrence = partyNameOccurrenceMap.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMap.put(party.getName(), occurrence - 1);
return true;
}
return false;
})
.toList();
var nonDuplicateParties = validParties.stream()
.filter(party -> {
var occurrence = partyNameOccurrenceMapCopy.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMapCopy.put(party.getName(), occurrence - 1);
return false;
}
return true;
})
.toList();
The above code only checks for party name but we also need to check for email, phone number and gst number.
The code written above works just fine but the readability, conciseness and the performance might be an issue as the data set is large enough like 10k rows in excel file