A faster way to look for duplicate vertices Obj,Colladae files

Question

I have read this post on correctly loading texture coordinates and it all works correctly but the problem here lies in speed.

Basically the idea is to look for an vertex processed earlier that may have the exact same attribute values as the current one being processed and if such a vertex exist, use that vertex index value for your indexBuffer and move on. Very simple concept and implementation ,here is how i have done it

class Vertex
{
 //The index values read from a file either by processing the f attribute in obj files or the <p> attribute for meshes in colladae files
 private final int
 vertexIndex,
 texCoordIndex,
 normalIndex,
 colorIndex;

 //The actual values for each attribute used in the mesh
 private final Vector3f
 vertex=new Vector3f(),
 normal=new Vector3f(),
 color=new Vector3f();
 private final Vector2f texCoord=new Vector2f();

 @Override
 public boolean equals(Object obj)//The key method used for finding duplicate vertices from the list
 {
  Vertex v=(Vertex)obj;

  //Check if every attribute of both are same
  return    this.vertexIndex==v.VertexIndex
         && this.texCoordIndex==v.texCoordIndex
         && this.normalIndex==v.normalIndex
         && this.colorIndex==v.colorIndex;
 }
}

Finaly we have an ArrayList of these

ArrayList<Vertex> vertices=new ArrayList();

And for every vertex read from the file the idea is simple

Vertex newVertex=readFromFile();

int prev=vertices.indexOf(newVertex);
if(prev!=-1)//We have found an similar vertex from before so use that 
{
 indexBuffer.add(prev); //the indexBuffer will use the vertex at that position
}
else
{
 vertices.add(newVertex); //Add  new vertex
 indexBuffer.add(vertices.size()-1);//New Vertex Index is simply the index of last element in the list
}

And while this yields correct results the problem is performance because for every nth vertex added we have to do an "LINEAR SEARCH!!!" on the previous n-1 vertices added before it to find our duplicate vertex which is terrible because it took me 7 seconds to load the Standford dragon model but if i completly discard the looking process and just live with the duplicates it took just 1.5 seconds.

An optimization i thought of was since i am using java was to harness the power of parallel streams of java 14 to look for duplicates like this.

Optional<Vertex> possibleDuplicate=vertices.stream()
                                           .parallel()
                                           .filter(prev->prev.equals(newVertex))
                                           .findFirst();

But this was an even more terrible idea as it took me now 12 seconds to load. A possible reason could be that spawning 100's of threads for every newVertex to be processed is an huge overhead.

He mentions in the post he used binary search on the sorted vertices to look for duplicates faster but some questions on that

Based on what attribute do i sort the vertices when the vertex has multiple attributes?

One way to do binary search on the ArrayList is by using the one built in the collections framework but how do i tell the comparator if one Vertex is less that or greater than the other?

It has gotten so slow for large models that i have to give the user the choice of eliminating duplicates with an flag.

Is there a better way?

stridecolossus · Accepted Answer · 2020-12-29T15:10:53.293

Searching through the list of vertices is going to be incredibly slow, not sure what the big-O representation would be but I imagine it wouldn't be pretty.

Instead use some sort of hashing mechanism to lookup an existing vertex - here is a fragment of the code I implemented to construct a model from an OBJ file that contains duplicate vertices:

public static class IndexedBuilder extends Builder {
    private final List<Integer> index = new ArrayList<>();
    private final Map<Vertex, Integer> map = new HashMap<>();

    @Override
    public IndexedBuilder add(Vertex vertex) {
        // Lookup existing vertex index
        final Integer prev = map.get(vertex);

        // Add new vertices
        if(prev == null) {
            // Add new vertex
            final int next = index.size();
            index.add(next);
            map.put(vertex, next);
            super.add(vertex);
        }
        else {
            // Existing vertex
            index.add(prev);
        }

        return this;
    }
}

The map is essentially a table of vertices with their associated index.

The only work to perform per-vertex is the calculation of the hash-code which will be much, much faster than searching (and is considerably simpler).

EDIT: Obviously this requires you to have implemented a decent hash-code implementation on your vertex class, something like this:

class Point {
    public final float x, y, z;

    @Override
    public int hashCode() {
        return Objects.hash(x, y, z);
    }
}

// Similar for normals & texture coordinates

class Vertex {
    private final Point point;
    private final Vector normal;
    private final TextureCoordinate coords;

    @Override
    public int hashCode() {
        return Objects.hash(point, normal, coords);
    }
}

Excellent suggestion but like you said the hash function might as well be another question in itself.How do I take 4 integer values and compute an unique hash which differs not only on the values but also on the order in which the values were supplied? For example 1,2,3,4 and 1,3,2,4 must output different hashes because the 1st vertex refers position 1 and normal 2 while the 2nd vertex refers position 1 and normal 3 talking about only the first 2 attributes. Know of any good algorithm? — Sync it, Dec 29 '20 at 15:01
Assuming your vertex implementation is made up of points, normals, texture coordinates, etc. then each can compute a hash code using https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#hash-java.lang.Object...- and the vertex does the same with each 'component' in that vertex. This built-in function delegates to a method that implicitly takes the 'order' into account. Added a small edit to the answer to illustrate. — stridecolossus, Dec 29 '20 at 15:06
It would be faster if we used just the integer index values of each attribute read from the file but I will experiment with both and let you know — Sync it, Dec 30 '20 at 04:47
The reason why the indices are _calculated_ (rather than using the actual index values from the file) in the code I posted is that in almost all the OBJ models I tried the _index_ values were the source of the duplication! Often they were simply just an incremental value which frankly could probably be ignored. But be interested to hear what you come up with from your experiments. — stridecolossus, Dec 30 '20 at 11:52

A faster way to look for duplicate vertices Obj,Colladae files

1 Answers1