3

I am attempting to remove duplicate records found in a List<List<float[]>>. I attempted to use a collection which does not allow duplicates(HashList) but I was unable to figure out how to cast this properly. To loop through all of my elements I would perform.

List<List<float[]>> tmp; 

for(int i=0; i<tmp.get(0).size();i++){
    System.out.println(java.util.Arrays.toString(tmp.get(0).get(i)));
}

I want to remove them from the list inside. So all elements found at tmp.get(0).get(Here to remove)

tmp.get(0).get(1) =[-70.89,42.12]

tmp.get(0).get(2) =[-70.89,42.12]

I would like to remove tmp.get(0).get(2)

Current implementation, which works when there is only 1 duplicate but not multiple duplicates.

for(int i=0; i<old.get(0).size();i++){
            if(i == old.get(0).size()-1){
                System.out.println("max size");
                return old;
            }
            else if(Arrays.toString(old.get(0).get(i)).equalsIgnoreCase(Arrays.toString(old.get(0).get(i+1)))){
                old.get(0).remove(i);
                i++;
            } else {
            i++;
            }
user2524908
  • 861
  • 4
  • 18
  • 46
  • 4
    Do you want to remove duplicates from the whole list or from the lists contained inside? – arshajii Sep 19 '13 at 14:58
  • You want to remove duplicate `List`s? – Sotirios Delimanolis Sep 19 '13 at 14:58
  • 1
    Related: [equals vs Arrays.equals in Java](http://stackoverflow.com/questions/8777257/equals-vs-arrays-equals-in-java) – Paul Bellora Sep 19 '13 at 14:59
  • Sorry, I want to remove them from the list inside. So all elements found at tmp.get(0).get(Here to remove) – user2524908 Sep 19 '13 at 15:00
  • You mention `ArrayList` in the tags. Are you aware of the method [`ArrayList.contains(Object)`](http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html#contains%28java.lang.Object%29)? – Andrew Thompson Sep 19 '13 at 15:00
  • if you are removing individual elements from the internal arrays, should be arrays be resized (new arrays created)? – John B Sep 19 '13 at 15:00
  • 1
    How about you provide some example input and output to make it clear what this should do. – millimoose Sep 19 '13 at 15:01
  • 1
    So, if there exists two ARRAYS that have the same elements one of the ARRAYS should be removed. However, if two arrays share a single common value the value should not be removed? Please clarify. – John B Sep 19 '13 at 15:02

3 Answers3

3

If I understood correctly, you are looking for Set<List<float[]>>.

Silviu Burcea
  • 5,103
  • 1
  • 29
  • 43
  • Ive updated the question with some additional implementation. – user2524908 Sep 19 '13 at 15:06
  • Yeah I gave that a try, I didnt even think to use a set. I am still getting duplicates however:(. So what my code is doing is reading from CSV which has a JSON column. I am using GSON to parse the JSON into a public Set> coordinates = new HashSet<>();. When I output them using List tmp = p.coordinates.iterator().next(); for(int i=0; i – user2524908 Sep 19 '13 at 15:23
  • 2
    The problem is that array's equals is the same with arr1 == arr2. However, you have 2 different references with same values. My guess : extend ArrayList and override the equals(you will also need hashcode overrided too, because you will use HashSet) method to use Arrays.equals(arr1, arr2). So, you will have Set> set = new HashSet>(); . Smart IDEs will generate hashcodes for you. – Silviu Burcea Sep 19 '13 at 15:32
  • Im sorry but I am not that experienced with java. What is MyArrayList referencing? Is it my variable List> tmp; ? – user2524908 Sep 19 '13 at 15:35
  • MyArrayList is your custom ArrayList(MyArrayList extends ArrayList). You must override the equals method to use Arrays.equals when you need to check if 2 arrays have the same values. Also, because you are using HashSet, you need to override the hashcode. I can do it for you, but a little bit later. – Silviu Burcea Sep 19 '13 at 15:41
  • -1 for extending `ArrayList`, which is essentially always a bad idea, but instead you should almost certainly be creating a custom class to wrap a `float[]`. – Louis Wasserman Sep 19 '13 at 16:59
  • I can't see why is a bad idea, can you explain? I am here to learn as well. – Silviu Burcea Sep 19 '13 at 17:31
1

How about:

List<List<Float[]>> outterList;
Set<Float[]> mySet = new HashSet<Float[]>();
for (List<Float[]> innerList : outterList){
    Iterator<Float[]> iterator = innerList.iterator();
    while(iterator.hasNext()){
        Float[] array = iterator.next();
        boolean added = mySet.add(array);
        if (!added)
           iterator.remove();
    }
}

To make the comparison, try converting to BigDecimal via new BigDecimal(double, MathContext)

Update: The test fails. seems to be an issue with comparing Arrays in a HashSet.

@Test
public void testArrays() {
    Set<String[]> set = new HashSet<String[]>();
    set.add(new String[] { "12.3f", "33.4f" });
    Assert.assertFalse(set.add(new String[] { "12.3f", "33.4f" }));
}

Update

So arrays work differently. Here you go:

This uses Guava's Predicate and Iterables.any(). This solution is less efficient than using a Set since it has to iterate the List each time but it does work if performance is not an issue.

private static <T> Predicate<T[]> equals(final T[] array) {
    return new Predicate<T[]>() {

        @Override
        public boolean apply(T[] arg0) {
            return Arrays.equals(array, arg0);
        }
    };
}

public static <T> List<List<T[]>> ProcessList(List<List<T[]>> old) {
    List<T[]> mySet = new ArrayList<T[]>();
    for (List<T[]> innerList : old) {
        Iterator<T[]> iterator = innerList.iterator();
        while (iterator.hasNext()) {
            T[] array = iterator.next();
            Predicate<T[]> contains = equals(array);

            if (Iterables.any(mySet, contains)) {
                iterator.remove();
            } else {
                mySet.add(array);
            }
        }
    }
    // for (int i = 0; i < old.get(0).size(); i++) {
    // System.out.println(java.util.Arrays.toString(old.get(0).get(i)));
    // }
    return old;
}

This test:

@Test
public void testListsFloat() {
    List<List<Float[]>> outter = new ArrayList();

    List<Float[]> inner1 = new ArrayList();
    inner1.add(new Float[] { 12.3f, 33.4f });
    inner1.add(new Float[] { 12.2f, 33.2f });
    inner1.add(new Float[] { 12.3f, 33.4f });

    List<Float[]> inner2 = new ArrayList();
    inner2.add(new Float[] { 12.1f, 33.1f });
    inner2.add(new Float[] { 12.2f, 33.2f });
    inner2.add(new Float[] { 12.3f, 33.4f });

    outter.add(inner1);
    outter.add(inner2);

    outter = ProcessList(outter);
    for (List<Float[]> list : outter) {
        for (Float[] array : list) {
            System.out.println(Arrays.toString(array));
        }
    }
}

resulted in this output:

[12.3, 33.4] [12.2, 33.2] [12.1, 33.1]

John B
  • 32,493
  • 6
  • 77
  • 98
  • Thanks for the reply. How would I return a List from that? Tried outterList.set(0, (List) mySet); but that doesnt seem to be correct implementation – user2524908 Sep 19 '13 at 15:42
  • 1
    return outterList. It has now been modified via the `remove` calls to the iterators. – John B Sep 19 '13 at 16:01
  • hm, that seems to still contain duplicates . public static List> ProcessList( List> old){ Set mySet = new HashSet(); for (List innerList : old){ Iterator iterator = innerList.iterator(); while(iterator.hasNext()){ float[] array = iterator.next(); boolean added = mySet.add(array); if (!added) iterator.remove(); } } for(int i=0; i – user2524908 Sep 19 '13 at 16:13
  • 1
    Please provide the output that suggests it didn't work? What is your definition of duplicates? This will only remove duplicate arrays (two arrays with the exact same elements in the same order). – John B Sep 19 '13 at 16:15
  • Ohhhh, you may also be running into issues with comparing Float values. Floats are not exact so that might be part of the issue. Please clarify. – John B Sep 19 '13 at 16:16
  • Maybe my array should be a String instead of float. duplicate values are as follows ; "[-72.27, 42.05] [-72.27, 42.05] [-72.27, 42.05] [-72.27, 42.05] [-72.27, 42.05] [-72.26, 42.05] [-72.26, 42.05]" – user2524908 Sep 19 '13 at 16:17
  • Yeah, I think your issue might be that `Float` comparison is always sketchy because they could differ at the 105th decimal. I suggest converting to `String` with a set precision or `BigDecimal`. Then comparison should work better. – John B Sep 19 '13 at 16:27
  • Yeah I tried string, however it yielded the same result. There are still duplicates – user2524908 Sep 19 '13 at 16:30
  • Then we are missing something or not understanding each other because this should work. Can you provide a unit test that executes the method and demonstrates the behavior? – John B Sep 19 '13 at 16:36
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/37675/discussion-between-user2524908-and-john-b) – user2524908 Sep 19 '13 at 16:48
  • Sorry, chat is blocked from my work. However, I have posted the solution that should work. – John B Sep 19 '13 at 16:55
  • Thanks for the help. That solution will certainly work because this is a one time script to remove duplicates from these millions of jsons I have. thanks – user2524908 Sep 19 '13 at 17:24
0

You may use LinkedHashSet to have no duplicates while filling. Though it's an implementation of Set, you can wrap it into a List if you need List methods.

Scadge
  • 9,380
  • 3
  • 30
  • 39