tensorflow neural network with 3d mesh as input

Question

I’m trying to build a neural network that takes as inputs the vertices position of a 3d mesh, and outputs the coordinates of two points on the inside.

for testing purpose I have a dataset containing a geometry with 20 points and two points on the inside for each one.

Every file of the dataset contains the coordinates of the vertices in a rank 2 with shape [3,20] array for the objs and shape [3,3] for the resulting points.

I’ve built a linear model, but the outcome is always very low (0,16) , doesn’t matter if I train it with 1000, 100.000 or 500.000

import tensorflow as tf
import numpy as np

objList    = np.load('../testFullTensors/objsArray_00.npy')
guideList  = np.load('..testFullTensors/drvsArray_00.npy')


x  = tf.placeholder(tf.float32, shape=[None, 60])
y_ = tf.placeholder(tf.float32, shape=[None, 6])

W = tf.Variable(tf.zeros([60,6],tf.float32))
b = tf.Variable(tf.zeros([6],tf.float32))

y = tf.matmul(x,W) + b

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_step.run(feed_dict={x: objList, y_: guideList})
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    sess.run(tf.global_variables_initializer())
    print accuracy.eval(session=sess , feed_dict={x: objs, y_: guides})`

should I build a different kind of network?

Thanks E

Tensorflow has built in functions to work with 2d and 3d data sets using convolution. When you flatten your data to be 1d (like you have) you lose the meaningful geometry. Consider the difference between https://www.tensorflow.org/get_started/mnist/beginners and https://www.tensorflow.org/get_started/mnist/pros . Convolution is a powerful technique and I think would greatly help you with this problem. — Anton Codes, May 09 '17 at 13:35
are the 2 points deterministic, are they 2 meaningful points (such as always behind the left then right eye of an animal) or are they just 2 points within the mesh? — Anton Codes, May 09 '17 at 13:40
yes, a convolutional neural network was my first choice, even cause the final idea is to have a humanoid mesh as input. I tried a simple linear model after reading this: http://stackoverflow.com/questions/34500641/3d-coordinates-as-the-output-of-a-neural-network/34500800#34500800 , btw I'll try to go back to use a CNN. I just wonder if is fine to use 3d coordinates as input or I need to voxelize the mesh. The two points are meaningful, they should be used like a 'guide' to place other objs. If using a CNN with vertices, ...how should I shape the tensor? — manu, May 10 '17 at 11:47
You say that the 2 points are a **guide to place other objects**. Do you believe that a human, if given the same data and a spreadsheet and a lot of time, would be able to consistently get the same 2 points in the order you specified them correct? I ask this because it have doubts. Think of AlphaGo and consider its architecture. What the AlphaGo NN predicted was the likelyhood of **every** move being a good move or not, and then that was fed to an **AlphaBeta** algo. The NN **didn't** predict the next move, it scored **all** moves. You may be much better success in creating a scoring. — Anton Codes, May 10 '17 at 13:14
I'm sure that a human would be able to consistently get the same 2 points, cause in my job I have to place inside a humanoid mesh a point for each bone joint, so potentially I have already a big database containing the human mesh and the joints positions. I just need to figure out how to make the network to do it itself :) (in the example of an arm, you would need to place the points on the shoulder, elbow and wrist) — manu, May 11 '17 at 02:18
That's great insight into your data, thanks! I think that would help someone help solve the problem to know the purpose of the two points. Honestly, when I first read the question I thought you just wanted 2 random points and that it wasn't do-able. — Anton Codes, May 11 '17 at 13:55

Anton Codes · Accepted Answer · 2017-05-11T14:59:18.207

First, thanks for the clarification of the question in the comments, it really helps understand the problem.

The problem as I understand it is (at least similar to) : given a bounding set of 3D points of the outside of an arm, identify

A the point in 3D that is on the Humerus that is closest to the body
B the point in 3D that is on the Humerus that is furthest from the body

What we need is a model that has enough expressivity to be able to do this. Let us consider how this problem is easiest for a human first. If a human was given a 3D model that they could look at and rotate then it would be a visual problem and they would probably get it right away.

If it was a list of 60 numbers and they were not told what those numbers meant and they had to product 6 numbers as an answer then it may not be possible.

We know that TensorFlow is good at image recognition, so let's turn the problem into an image recognition problem.

Let's just start with the MNIST network and talk about what it would take to change it to our problem!

Convert your input to voxels such that each training example will be one 3D image of size [m,m,m] where m is the resolution you need (start with 30 or so for initial testing and maybe go as high as 128). Initialize your 3D matrix with 0's. Then for each of the 20 data points change the corresponding voxel to 1 (or a probability).

That is you input, and since you have lots of training examples you will have a tensor of [batch,m,m,m].

Do the same for your expected output.

Send that through layers of convolution (start with 2 or 3 for testing) such that your output size is [batch,m,m,m].

Use back propagation to train your output layer to predict your expected output.

Finally you will have a network that doesn't return a 3D coordinate of the Humerus but instead returns a probability graph of where it is in 3D space. You can scan the output for the highest probabilities and read off the coordinates.

This is very similar to how AlphaGo is beating Go.

suggested improvement - train 1 network to predict A and a separate network to predict B

The height (and width) of each convolution (at a minimum) will be **m** / **conv_layers** + 1 — Anton Codes, May 11 '17 at 14:25
The idea of using GAN is good but converting the input to voxels may not be a good idea. The data is just huge, and converting all mesh files into a standard size voxels is not as easy as thought. — Jason Ching, Jun 07 '19 at 06:56
@JasonChing No GAN was suggested. The reference to AlphaGo was meant to be about the policy network, looking at an image and evaluating the most likely region of play (or in this case the humerus) — Anton Codes, Dec 02 '19 at 18:19

tensorflow neural network with 3d mesh as input

1 Answers1