0

I am trying to find a best way to filter out some items from a list based on certain rules. For example we have

public class Person{
    String name;
    String sex;
    String dob;
    String contactNo;
    Person(String name, String sex, String dob, String contactNo) {
        this.name = name;
        this.sex = sex;
        this.dob = dob;
        this.contactNo = contactNo;
    }
}

List<Person> persons = Arrays.asList(new Person("Bob", "male", "19800101", "12345"),                
        new Person("John", "male", "19810101", "12345"),
        new Person("Tom", "male", "19820101", "12345"),
        new Person("Helen", "female", "19800101", "12345"),
        new Person("Jack", "male", "19830101", "12345"),
        new Person("Suan", "female", "19850101", "12345"));

I want to remove the pair of male and female which have the same dob and contactNo (Remove Bob and Helen in above example). I implemented this as below using a nested loop which worked but looks ugly. Is there a better to achieve this please? Can I implement predicate to do this?

public void filterPersons() {       
    List<Person> filtered = new ArrayList<Person>();

    for (Person p: persons) {
        boolean pairFound = false;
        for (Person t: persons) {
            if ((p.sex.equals("male") && t.sex.equals("female")) || (p.sex.equals("female") && t.sex.equals("male"))) {
                if (p.dob.equals(t.dob) && p.contactNo.equals(t.contactNo)) {                       
                    pairFound = true;
                    break;
                }
            }
        }
        if (!pairFound) {filtered.add(p);}          
    }

    System.out.println("filtered size is: " + filtered.size());
    for (Person p: filtered) {
        System.out.println(p.name);
    }
}

Many thanks.

I've rewritten the above method something like below which looks better imho:

public void testFilter() {      
    Predicate<Person> isPairFound = new Predicate<Person>() {
        @Override public boolean apply(Person p) {              
            boolean pairFound = false;
            for (Person t: persons) {
                if ((p.sex.equals("male") && t.sex.equals("female")) || 
                        (p.sex.equals("female") && t.sex.equals("male"))) {
                    if (p.dob.equals(t.dob) && p.contactNo.equals(t.contactNo)) {                       
                        pairFound = true;
                        break;
                    }
                }
            }
            return pairFound;
        }
    };

    Iterable<Person> filtered = Iterables.filter(persons, isPairFound);     
    for (Person p: filtered) {
        System.out.println(p.name);
    }
}
Can Lu
  • 736
  • 2
  • 8
  • 17
  • Override the `equals()` and `hashCode()` method probably?! – Rahul Nov 13 '13 at 07:24
  • Why do you need `(p.sex.equals("male") && t.sex.equals("female")) || (p.sex.equals("female") && t.sex.equals("male"))` ? Surely, you only need to check one half of this, because `p` and `t` will both iterate across the entire collection. – Dawood ibn Kareem Nov 13 '13 at 07:26
  • @DavidWallace even better : `!p.sex.equals(t.sex)` – Silviu Burcea Nov 13 '13 at 07:29
  • Hi David, this is a bad example probably, in practice, we could have more types than male and female, maybe I should have called it something else. Let's say educationLevel, i.e. highschool, uni, master and doctor. I only want to remove the pair have education level highschool and uni with same dob and contactNo. Thanks – Can Lu Nov 13 '13 at 07:31
  • thanks Silviu Burcea. yes i can do it but still the code is ugly. I want to get rid of this nested loop and flag completely. – Can Lu Nov 13 '13 at 07:38
  • 1
    Guava can do some nice stuff with collections, http://marxsoftware.blogspot.no/2011/10/filtering-and-transforming-java.html – arynaq Nov 13 '13 at 07:40

3 Answers3

1

I don't think the nested for loops are particularly ugly. You are looking for matches between items in your list based on, effectively, arbitrary criteria so you need to compare every entry with every other entry.

One improvement you could consider is to separate the iterating code from the comparison logic. This is where you were heading with the Predicate. To do this you would need a Predicate that takes two objects instead of one.

public interface PredicateComparator<T> {
    boolean compare(T o1, T o2);
}

Your code would now look something like this

public void filterPersons() {

    PredicateComparator<Person> predicate = new PredicateComparator<Person>() {
        public boolean compare(Person o1, Person o2) {
            // comparison logic in here
        }

    };

    List<Person> filtered = new ArrayList<Person>();
    for (Person p : persons) {
        for (Person t : persons) {
            if (predicate.compare(p, t)) {
                filtered.add(p);
            }
        }
    }

    System.out.println("filtered size is: " + filtered.size());
    for (Person p: filtered) {
        System.out.println(p.name);
    }
} 
pillingworth
  • 3,238
  • 2
  • 24
  • 49
  • Thank you pauli, it's the answer I was looking for. Much appreciated. – Can Lu Nov 20 '13 at 09:57
  • Thats ok. The other answers take a different approach - they index the Person data into a Map/Set and then use a single loop with lookups into the map/set. You may find that this performs better (single loop vs nested loop). You could use a standard Predicate in this case initialised with the map/set index. – pillingworth Nov 21 '13 at 09:12
0

You can use hashmap to remove duplicates. Each entry in the map would denote

(DOB+ContactNo -> persons index in the original list)

The function

public void filterPersons() {       
    List<Person> filtered = new ArrayList<Person>(persons); // COPY the whole list
    HashMap<String,Integer> map = new HashMap<String,Integer>();
    int count=-1;

    for (Person p: persons) {
        count++;
        String g = p.sex; 
        String g_opp = g.equals("male")? "female":"male";

        if(!map.contains(p.dob+p.contactNo+g_opp))
        {
            // if not exists, add to map
            map.put(p.dob+p.contactNo,count+g);
        }
        else
        {
            // if duplicate found in map, remove both people from list
            filtered.remove(count);
            filtered.remove(map.get(p.dob+p.contactNo+g));

            // now filtered has 2 less elements, update count
            count -= 2;
        }
   }
}   
Ankit Rustagi
  • 5,539
  • 12
  • 39
  • 70
0

Is there only one way to determine identity between two persons? If so, it would be best to encapsulate this by overriding 'equals' and 'hashcode'.

After you do this you can take one of the following approaches:

  1. If you are creating a collection of Person instances and want to make sure that when adding the same person more than once will preserver only a single instance in the collection- use the Set interface as the underlying collection (and probably the HashSet implementation). With equals and hashcode properly in place, the set will not allow duplicates.

  2. If you are given a collection (meaning you have no control of it's creation and therefore can't use the above approach to validate that it is constructed without duplicates) and want to filter out duplicate instances you can simply feed it to the constructor of a HashSet, like this:

    Collection<Integer> containsRepeatingNumbers = Arrays.asList(1,2,3,4,3,3,3,3);
    Set<Integer> alldistincts = new HashSet<>(containsRepeatingNumbers);
    System.out.println(alldistincts);   //[1, 2, 3, 4]
    

BTW, if you anticipate multiple criteria for identity in the future, you can use the strategy propsed here

Community
  • 1
  • 1
Vitaliy
  • 8,044
  • 7
  • 38
  • 66
  • Thanks Vitaliy, I could encapsulate this into equals and hashcode as suggested, however a. I don't own the code (Person class). b. The comparison criteria is pretty arbitrary, could change or introduce more criterias in the future. – Can Lu Nov 20 '13 at 10:08