1

I have a dataset which uses Latitudes and Longitudes: I want to create a Cross Feature for Euclidean Distance:

origin_lat, origin_lon,dest_lat, dest_lon
41.857183858,-87.620334624,42.001571027,-87.695012589

I already read each as 4 different tf.float Tensors (tf.feature_column.numeric_column)

This is a similar cross column I create:

# Creating a boolean flag

capital_indicator = features['capital_gain'] > features['capital_loss']
features['capital_indicator'] = tf.cast(capital_indicator, dtype=tf.int32)

I would like to have something like this:

 euclid_distance = distance((['origin_lat', 'origin_lon']), (['dest_lat', 'dest_lon']))
gogasca
  • 9,283
  • 6
  • 80
  • 125

1 Answers1

2

For Euclidean distance we can just use the formula:

euclidean distance formula

which we can 'translate' into TensorFlow code as follows:

distance = tf.sqrt(
    tf.pow(origin_lat - dest_lat, 2) + tf.pow(origin_lon - dest_lon, 2)
)

I should perhaps just mention in passing that using the Euclidean distance function for a co-ordinate system is fine for flat surfaces, which (of course) the earth's latitude & longitude are not!

The curvature of the earth reduces using euclidean distance on lats & longs to 'an approximation' - one that decreases in accuracy with the distances involved and the distance from the equator. You may well be happy to accept this approximation in your model. If not, then you need to have a look into the Haversine formula which is a tad more complex but should still be do-able in TensorFlow

Stewart_R
  • 13,764
  • 11
  • 60
  • 106