Google ARCore Domain Model by Example

Question

I'm trying to read and make sense of Google ARCore's domain model, particularly the Android SDK packages. Currently this SDK is in "preview" mode and so there are no tutorials, blogs, articles, etc. available on understanding how to use this API. Even Google itself suggests just reading the source code, source code comments and Javadocs to understand how to use the API. Problem is: if you're not already a computer vision expert, the domain model will feel a little alien & unfamiliar to you.

Specifically I'm interested in understanding the fundamental differences between, and proper usages of, the following classes:

According to Anchor's javadoc:

"Describes a fixed location and orientation in the real world. To stay at a fixed location in physical space, the numerical description of this position will update as ARCore's understanding of the space improves. Use getPose() to get the current numerical location of this anchor. This location may change any time update() is called, but will never spontaneously change."

So Anchors have a Pose. Sounds like you "drop an Anchor" onto something thats visible in the camera, and then ARCore tracks that Anchor and constantly updates its Pose to reflect the nature of its onscreen coordinates maybe?

And from Pose's javadoc:

"Represents an immutable rigid transformation from one coordinate frame to another. As provided from all ARCore APIs, Poses always describe the transformation from object's local coordinate frame to the world coordinate frame (see below)...These changes mean that every frame should be considered to be in a completely unique world coordinate frame."

So it sounds like a Pose is something that is only unique to the "current frame" of the camera and that each time the frame is updated, all poses for all anchors are recalculated maybe? If not, then what's the relationship between an Anchor, its Pose, the current frame and the world coordinate frame? And what's a Pose really, anyways? Is a "Pose" just a way of storing matrix/point data so that you can convert an Anchor from the current frame to the world frame? Or something else?

Finally, I see a strong correlation between Frames, Poses and Anchors, but then there's PointCloud. The only class I can see inside com.google.ar.core that uses these is the Frame. PointClouds appear to be (x,y,z)-coordinates with a 4th property representing ARCore's "confidence" that the x/y/z components are actually correct. So if an Anchor has a Pose, I would have imagined that a Pose would also have a PointCloud representing the Anchor's coordinates & confidence in those coordinates. But Pose does not have a PointCloud, and so I must be completely misunderstanding the concepts that these two classes model.

The question

I've posed several different questions above, but they all boil down to a single, concise, answerable question:

What is the difference in the concepts behind Frame, Anchor, Pose and PointCloud and when do you use each of them (and for what purposes)?

Ian M · Accepted Answer · 2017-09-03T22:34:12.410

8

A Pose is a structured transformation. It is a fixed numerical transformation from one coordinate system (typically object local) to another (typically world).

An Anchor represents a physically fixed location in the world. It's getPose() will update as the understanding of the world changes. For example, imagine you have a building with a hallway around the outside. If you walk all the way around that hallway, sensor drift results in you not winding up at the same coordinates you started at. However, ARCore can detect (using visual features) that it is in the same space it started it. When this happens, it distorts the world so that your current location and original location line up. As part of this distortion, the location of anchors will be adjusted as well so that they stay in the same physical place.

Because of this distortion, a Pose relative to the world should be considered valid only for the duration of the frame during which it was returned. As soon as you call update() the next time, the world may have reshaped at that pose could be useless. If you need to keep a location longer than a frame, create an Anchor. Just make sure to removeAnchors() anchors that you're no longer using, as there is ongoing cost for each live anchor.

A Frame captures the current state at an instant and changes between two calls to update().

PointClouds are sets of 3D visual feature points detected in the world. They are in their own local coordinate system, which can be accessed from Frame.getPointCloudPose(). Developers looking to have better spatial understanding than the plane detection provides can try using the point clouds to learn more about the structure of the 3D world.

Does that help?

edited Sep 03 '17 at 22:34

answered Sep 03 '17 at 17:46

Ian M

672
4
11

That certainly helps a lot Ian (+1)! But I do have a few followup questions, if you don't mind. **(1)** When you say "*A `Pose` is a structured transformation. It is a fixed numerical transformation from one coordinate system (typically object local) to another (typically world)...*" can you give me a simple example of what you mean by "numerical transformation"? Do you mean that the local (x,y,z)-coordinate for something might be (50,45,100), but its world coordinate might be (2000,600,3000). And so to translate from local -> world the "pose" might be (1950,555,2900)? – smeeb Sep 03 '17 at 18:40
^^^Meaning to **transform** from local -> world you'd need to add (50+1950, 45+555, 100+2900) = (2000,600,3000)? If so wouldn't the transformation be the same for all local coordinates?! And if not, can you give me a realistic example of what a `Pose` is (and how you go from local to world coordinates)? – smeeb Sep 03 '17 at 18:42
**(2)** When you say "*A Frame captures the current and changed at an instant...*" I'm not following you 100%. I assume 1 `Frame` instance corresponds to a single camera/video frame? If so, then how does a `Frame` represent "change" as you say? – smeeb Sep 03 '17 at 18:43
**(3)** If a `Pose` already contains local coordinates, then of what use is a `PointCloud`, if it also contains the same local coordinates? I guess I'm just not understand the difference between these two and how they relate back to the `Frame`. How can a `Frame` have `getPointCloud()`, `getPose()` and a `getPointCloudPose()` method?! How are these three things different and when would one use them? **Thanks again so much!** – smeeb Sep 03 '17 at 18:47
1

@smeeb 1. Think about drawing a 3D model at a location in the world. The model has it's own local coordinate systems in which the 3D vertices of the polygonal model are defined, with the model likely centered/standing at 0,0,0 and facing along one of the axes. To render this in the world, you need to change the direction it's facing (rotate about the origin) and move it to the virtual object's location (translate). The `Pose` contains that information. – Ian M Sep 03 '17 at 21:17
1

@smeeb 2. The changes I refer to are `getUpdatedAnchors()`, `getUpdatedPlanes()`, and `isDisplayRotationChanged()`. These each capture either items that changed or the occurance of a change event since the previous `update()`. – Ian M Sep 03 '17 at 21:19
1

@smeeb 3. Every `Pose` returned by ARCore describes a transformation from some local coordinate frame to the current world coordinate frame. `Frame.getPose()` transforms camera coordinates (0,0 at the camera, -Z along the direction the camera looks, +X and +Y being display-right and display-up respectively) to the current world coordinates frame (+Y up inertial). `getPointCloudPose()` Instead returns the transformation from the point cloud's coordinate frame (unconstrained) to the same current world coordinate frame. This tells you how to transform point cloud points into world spaec. – Ian M Sep 03 '17 at 21:24
1

Thanks so much @Ian M (+1 for all 3). I have one final (I **promise**) followup question here, if you'll tolerate me: although I'm following *most* of what you're saying in your answer to #3 above, I'm still not seeing "the forest through the trees" on that one. Any chance you can give me 3 simple concrete (real world) use cases for when one would use: `Frame#getPointCloud()`, `Frame#getPose()` and `Frame#getPointCloudPose()`? Thanks again for all the help here! – smeeb Sep 04 '17 at 12:42
Example usage: `frame.getPose()`: getting the camera position (see docs for `getViewMatrix()` https://developers.google.com/ar/reference/java/com/google/ar/core/Frame.html#getViewMatrix(float[],%20int) ). `frame.getPointCloud()` + `frame.getPointCloudPose()`: getting the points to visualize the point cloud, and the transformation to get those points in the world coordinate frame. Most apps won't need to call the first one directly, since the primary use of the camera pose is building the view matrix. See the HelloAR sample code for examples of the point cloud functions. – Ian M Sep 04 '17 at 15:33
Looks like this was written before the official release, but there's no longer a frame.getPointCloudPose() method and frame.acquirePointCloud() seems to get the point cloud in world coordinates? So followup question is if you have a floor plane that's been converted into an anchor, can you measure heights of point cloud points by using the point cloud point y compared to the floor.pose.ty()? – kenyee Nov 13 '18 at 15:11

score 0 · Answer 2 · answered Apr 17 '19 at 18:03

Using the following link you can find and answer about Frame, Anchor and Pose:

ARCore – Session, Frame, Camera and Pose.

Additionally, here's an info on What a Point Cloud is:

Point Cloud is a visual cloud of points (of yellow color, usually) in World Space which represent a reliable positions for dots for 3D tracking on a real-world objects. Point Cloud looks like this:

And here's what Google says about Point Cloud:

PointCloud contains a set of observed 3D points and confidence values. This class implements Closeable and usually should be used in a Java try-with-resources or Kotlin use block, for example:

To get a PointCloud use the following code:

Frame frame = session.update();

try (PointCloud pointCloud = frame.acquirePointCloud()) {
    // Accessing point cloud data.......
}

Google ARCore Domain Model by Example

The question

2 Answers2

Linked

Related