A vertex is not 3 floats; that's why the answer says that. A vertex is all of the data needed for a single vertex of the output primitive. That includes the position (3 floats), but it also includes any texture coordinates, normals, per-vertex colors, or other vertex attributes you are interested in.
The purpose of the index buffer is to not have to repeat as much data. Each face has 4 independent vertices, but each face is also 2 triangles which share 2 vertices. If you didn't use indexed rendering, then you would need to represent each face as either a separate GL_TRIANGLE_STRIP draw call (so drawing a full cube requires 6 separate draw calls), or you would have to provide 6 vertices per face, where 2 vertices are the exact same vertex data copied over again.