1

I have a list of dictionaries. The key is item_cd and value is location_coordinates. I would like to find the euclidian distance of these locations and create a separate similar list based on the proximity of their locations.

Steps

  • The initial distance will be calculated from the origin i.e. (0,0).
  • The closest item will be added to a separate list suppose it is {5036885955: [90.0, 61.73]} in this case.
  • Now I would like to find the closest order from the centroid of the previous orders. In this case, the centroid will be [90.0, 61.73] of order {5036885955: [90.0, 61.73]} as there is only one coordinate at the moment. The next closest order from the centroid of the previous order will be {5036885984: [86.0, 73.03]}
  • I would like to repeat the above steps until the list is empty

Input :

 [{5036885850: [92.0, 88.73]}, {5036885955: [90.0, 61.73]}, {5036885984: [86.0, 73.03]}, {5036885998: [102.0, 77.54]}]

What I have tried :

def calculate_centroid(self, locations: list) -> list:
        """
        get the centroid of the  order
        param: Location object
        return: centroid of an order
        """
        x, y = [p[0] for p in locations], [p[1] for p in locations]
        centroid = [round(sum(x) / len(locations), 2), round(sum(y) / len(locations), 2)]
        return centroid 

def get_closest_order_to_origin(self, order_centroid_list: list) -> list:
    """
     get the next closest order 
    """
        initial_order = [[0, 0]]
        next_order = []
        centroids = []
        
    for order in order_centroid_list:

        for shipping_req_key, centroid in order.items():
            distance = math.sqrt(((initial_order[0][0] - centroid[0]) ** 2) + ((initial_order[0][1] - centroid[1]) ** 2))

I am not sure how should I proceed further. I would appreciate your feedback

Sam
  • 352
  • 2
  • 4
  • 22

2 Answers2

4

You can use numpy.linalg.norm() to calculate the Euclidean distance between two numpy array. So you can declare those position arrays as a numpy array and apply this function to calculate distance.

Let's see this step by step.

  1. Declared initial current_pos as origin (0,0). Extracted the points sequentially in the list named pos_list. Declared result_list as empty list to fetch results.
  2. From the pos_list each positions can be taken and Euclidean distance with current_pos is calculated and stored in the dist_list.
  3. The min(dist_list) gives the minimum distance min_dist. The corresponding index of min_dist can be fetched from dist_list and the relevant position and entry of the input data can be identified and processed (i.e. removing from data and appending to result_list)
  4. The whole process continues until the data becomes empty.

The whole implementation:

import numpy as np

data = [{5036885850: [92.0, 88.73]}, {5036885955: [90.0, 61.73]}, {5036885984: [86.0, 73.03]}, {5036885998: [102.0, 77.54]}]

pos_list = [list(item.values())[0] for item in data]

current_pos = np.array([0,0])

result_list = []


while (len(data) != 0):
    dist_list = [np.linalg.norm(current_pos - np.array(pos)) for pos in pos_list]
    min_dist = min(dist_list)
    item_index = dist_list.index(min_dist)
    to_be_moved = data[item_index]
    result_list.append(to_be_moved)
 #   current_pos = pos_list[item_index]
    pos_list.remove(pos_list[item_index])
    data.remove(to_be_moved)


print(result_list)

You're requirement was not clear in the question, so I'm handling two cases.

First case: Arrange the points in such an order such that the distances of each points from the origin is increasing. The above code works for this scenario and this matches your expected output.

Second case: Arrange the points a,b,c,d,... in such a way so that a is nearest to origin, b is nearest to a, c is nearest to d and so on. To see this case in action just revert the commented line.

NOTE: If any type of sort or positional modification is done, this solution will break.


UPDATE

According to your last added condition, you need to calculate the centroid of all the points in the resulting order, then find the order that is closest to the centroid.

You can pass the list of points that has been already added to the result find the centroid with the following function:

def get_centroid(lst):
    arr = np.array(lst)
    length = arr.shape[0]
    if length == 0:
        return 0,0
    else:
        sum_x = np.sum(arr[:, 0])
        sum_y = np.sum(arr[:, 1])
        return sum_x/float(length), sum_y/float(length)

to do so, keep a track of the point that is added to the final result and removed from the pos_list in another list called pos_removed_list:

pos_removed_list.append(pos_list[item_index])

and replace the commented line:

#   current_pos = pos_list[item_index]

with

current_pos = np.array(get_centroid(pos_removed_list))

and you're ready to go!

**Full Code: **

import numpy as np

def get_centroid(lst):
    arr = np.array(lst)
    length = arr.shape[0]
    if length == 0:
        return 0,0
    else:
        sum_x = np.sum(arr[:, 0])
        sum_y = np.sum(arr[:, 1])
        return sum_x/float(length), sum_y/float(length)
        

data = [{5036885850: [92.0, 88.73]}, {5036885955: [90.0, 61.73]}, {5036885984: [86.0, 73.03]}, {5036885998: [102.0, 77.54]}]

pos_list = [list(item.values())[0] for item in data]

pos_removed_list = []

current_pos = np.array((0,0))

result_list = []

print('initial centroid: ' + str(current_pos))

while (len(data) != 0):
    dist_list = [np.linalg.norm(current_pos - np.array(pos)) for pos in pos_list]
    min_dist = min(dist_list)
    item_index = dist_list.index(min_dist)
    to_be_moved = data[item_index]
    print('moved: ' + str(to_be_moved))
    result_list.append(to_be_moved)
    pos_removed_list.append(pos_list[item_index])
    pos_list.remove(pos_list[item_index])
    current_pos = np.array(get_centroid(pos_removed_list))
    print('current centroid: ' + str(current_pos))
    data.remove(to_be_moved)


print('\nfinal result: ' + str(result_list))

Your step by step output will be:

initial centroid: [0 0]
moved: {5036885955: [90.0, 61.73]}
current centroid: [90.   61.73]
moved: {5036885984: [86.0, 73.03]}
current centroid: [88.   67.38]
moved: {5036885998: [102.0, 77.54]}
current centroid: [92.66666667 70.76666667]
moved: {5036885850: [92.0, 88.73]}
current centroid: [92.5    75.2575]

final result: [{5036885955: [90.0, 61.73]}, {5036885984: [86.0, 73.03]}, {5036885998: [102.0, 77.54]}, {5036885850: [92.0, 88.73]}]

Let me know if this is what you were looking for

devReddit
  • 2,696
  • 1
  • 5
  • 20
  • This is what exactly I was looking for. Can we add one more condition to this? I have updated the third point in the description. – Sam Aug 15 '21 at 14:05
  • If I would like to find the closest order from the centroid of the previous orders, how should I be dealing with it? – Sam Aug 15 '21 at 15:10
  • 1
    Questions: 1. How does your data look like? 2. Can `{5036885850: [92.0, 88.73]}` have multiple points in it? For example, `{5036885850: [[92.0, 88.73], [89.0, 78.2]]}` 3. Centroid will be calculated from which points? Is it from all the points of all orders in the current result list? Or just points under the last entry in the current result list? IF the answer of my point 2 is NO, which means there's only 1 point in the order like `{5036885850: [92.0, 88.73]}`, then my current solution will work if you uncomment the commented line. Otherwise, I'll provide another solution. – devReddit Aug 17 '21 at 04:57
  • `Q1`. Data looks the same as mentioned in the post. `Q2`. `{5036885850: [92.0, 88.73]}` `5036885850` is the `item_cd` and [92.0, 88.73] is the location of the item in the warehouse. therefore `{5036885850: [92.0, 88.73]` does not have multiple points. `Q3` First we calculate the centroid and then find the closest distance from this centroid to the rest of the orders in the data. The resulting closest order will be added to `result_list`. – Sam Aug 18 '21 at 07:56
  • 1
    @Sam, so you're expecting the centroid of all the points as the initial point instead of (0,0)? – devReddit Aug 18 '21 at 08:14
  • Yes, you are right. Should I add more information? – Sam Aug 18 '21 at 08:21
  • 1
    @Sam what is the expected output in this case? – devReddit Aug 18 '21 at 08:37
  • output is similar to the previous case – Sam Aug 18 '21 at 09:03
  • `1.` Find the closest(euclidean distance) order centroid to the origin. In this example, it's- {5036885955: [90.0, 61.73]} `2`. Find the next closest order (euclidean distance) to the centroid of previous orders (centroid of centroids) → `centroid((90.0, 61.73))` `3.`Find the distance between the rest of the orders from the centroid of previous orders - i.e. from`centroid((90.0, 61.73))` Select the closest order as the next candidate in the batch. In this case, it is - `{5036885984: [86.0, 73.03]}` – Sam Aug 18 '21 at 09:19
  • 1
    The next centroid will be `[88. 67.38]`, right? – devReddit Aug 18 '21 at 09:29
  • 1
    @Sam I've updated my answer, see if that helps – devReddit Aug 18 '21 at 09:38
  • 2
    I'm testing the code. I will accept the answer soon. Thanks a lot. Sorry for the late reply. – Sam Aug 20 '21 at 07:42
  • can I add some stopping conditions here? Like if I want to create a batch of 20 orders. Once I have the batch of 20 orders , I will start again from the `origin(0,0)` and repeat the process – Sam Aug 24 '21 at 05:57
  • Should I create a separate post for this question? – Sam Aug 24 '21 at 05:59
  • you can find new post here: https://stackoverflow.com/questions/68904489/create-a-batch-list-of-length-20-each-from-a-given-list – Sam Aug 24 '21 at 08:52
  • I have created a separate post for this one :https://stackoverflow.com/questions/68904489/create-a-batch-list-of-length-20-each-from-a-given-list – Sam Aug 24 '21 at 09:19
  • 1
    looking into it – devReddit Aug 24 '21 at 12:17
1

Using the key argument to the min function can help keep our code efficient and clean. In addition we can help keep track of everything with a few functions that use type aliases to annotate their inputs and outputs. We can calculate the new centroid in constant time from the old centroid as long as we keep track of how many points are involved in calculating the old centroid. Finally we can minimize distance squared to the centroid instead of distance and save ourselves calculating square roots.

from typing import Callable, cast, Dict, List, Tuple

Point = Tuple[float, float]
Order = Dict[int, List[float]]

orders: List[Order] = [
    {5036885850: [92.0, 88.73]},
    {5036885955: [90.0, 61.73]},
    {5036885984: [86.0, 73.03]},
    {5036885998: [102.0, 77.54]}
]


def new_centroid(new_point: Point, old_centroid: Point, old_centroid_weight: float) -> Point:
    new_x, new_y = new_point
    old_x, old_y = old_centroid
    new_point_weight = 1 / (old_centroid_weight + 1)
    return (
        new_point_weight * new_x + (1 - new_point_weight) * old_x,
        new_point_weight * new_y + (1 - new_point_weight) * old_y,
    )


def distance_squared_to(point: Point) -> Callable[[Order], float]:
    def distance_squared_to_point(order: dict) -> float:
        order_point = get_point(order)
        from_x, from_y = point
        order_x, order_y = order_point
        return (from_x - order_x) ** 2 + (from_y - order_y) ** 2
    return distance_squared_to_point


def get_point(order: Order) -> Point:
    return cast(Point, list(order.values()).pop())


sorted_orders = []
centroid = (0.0, 0.0)
centroid_weight = 0

while orders:
    next_order = min(orders, key=distance_squared_to(centroid))
    orders.remove(next_order)
    sorted_orders.append(next_order)
    next_point = get_point(next_order)
    centroid = new_centroid(next_point, centroid, centroid_weight)
    centroid_weight += 1

print(sorted_orders)

A couple notes here. Unfortunately, Python doesn't have a really nice idiom for removing the minimum element from a collection. We have to find the min (order n) and then call the remove on that element (order n again). We could use something like numpy.argmin but that's probably overkill and a pretty heavy dependency for this.

The second note is that our Orders are kinda hard to work with. To get the location, we have to cast the values as a list and pop an element (or just take the first). Given an order we can't index into it without introspecting to find the key. It might be easier if the orders were just a tuple where the first element was the order number and the second was the location. Any number of other forms could also be helpful (named tuple, data class, dictionary with keys item_cd and location_coordinates, ...). Location coordinates could also be just a tuple instead of a list, but that's more of a type hinting issue than anything else.

Kyle Parsons
  • 1,475
  • 6
  • 14
  • `any number of other forms could also be helpful (named tuple, data class, dictionary with keys item_cd and location_coordinates, ...)` -> This makes sense. I should have created the orders as tuples – Sam Aug 24 '21 at 01:32
  • `centroid_weight += 1` Sorry Why do we need this? – Sam Aug 24 '21 at 01:35
  • can I add some stopping conditions here? Like if I want to create a batch of 20 orders. Once I have the batch of 20 orders, I will start again from the origin(0,0) and repeat the process. Should I create a separate post for that? – Sam Aug 24 '21 at 07:16
  • 1
    You need to track the `centroid_weight` because each new point has a diminishing impact on the total centroid. If you have 1 point already and then add one more the new centroid is going to be half way between the old centroid and the new point. If you have 20 points already, then when you add a new point, the new centroid is going to be much closer to the old centroid than new point. So we need to keep track of how many points are involved in the centroid at any point in order to update the centroid in constant time. – Kyle Parsons Aug 24 '21 at 13:30
  • 1
    And if you wanted to batch things you could always have `while len(sorted_orders) < 20:` or `for _ in range(20):` – Kyle Parsons Aug 24 '21 at 13:35