How do I assign label to array rows?

Question

I have an array A with size 600x6 that each row is a vector and I want to calculate the distance of each row from all other rows of the array. calculating the distance ( BD distance) is easy and I can calculate all the distances and put them in a matrix D(600x600), but during my code, I have just the value of the row not the index of it and so I cannot use D to find the distance quickly. so I have to calculate the distance again. my question is it a way to assign a label or index to each row of A during the code? for example, I have A1 and A2 so I very fast find out that I have to extract D1,2 for distance. I am not very familiar with python. Could you please tell me how can I do this without calculating the distance each time? as you can see in the following code, the centroid during the next step of the code will change. so I have to calculate the BD distance again which is time-consuming. but if I could save the index of centroid I could extract the distance from my distance matrix very fast.

def kmeans_BD(psnr_bitrate,K,centroid):
    m=psnr_bitrate.shape[0]#number of samples
    n=psnr_bitrate.shape[1]#number of bitrate
    
    # creating an empty array
    BD=np.zeros((m,K))
    #weight of BD_rate
    wr=0.5
    #weight of BD_Q
    wq=0.5
    n_itr=10
    # finding distance between for each centroid
    for itr in range(n_itr):
        for k in range(K):
            for i in range(len(psnr_bitrate)):
                BD_R=bd_rate(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_R==-2:
                    BD_R=np.inf
                BD_Q=bd_PSNR(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_Q==-2:
                    BD_Q=np.inf
                BD[i,k]=np.abs(wr*BD_R+wq*BD_Q)

centroid is not a new array. it is selected from the psnr_bitrate that each time based on the distance a group of them was selected as a new centroid. for this problem, I do not know how can I find the index of the centroid in the main psnr_bitrate array. — david, Jun 27 '22 at 10:57

Claudio · Answer 1 · 2022-06-28T17:27:11.923

This answer is an updated one implementing all the appreciated remarks made in the comments about the problems with implementing the before provided code.

The getIndex() function is the core of the provided solution requested in the question and should now work with all possible array types (Python list, numpy ndarray, sympy Array, ...). It uses different methods for getting the array index while given a value for an array item. If no for the datatype specialized way is available the index will be found using a loop with Python all() function.

To demonstrate the functionality the code comes with a getDistance() function and an example of array data. The assert statements in the code assure that the code works as expected:

def getDistance(vector_1, vector_2, vector_matrix_A, distance_matrix_D):
    try: 
        distance = distance_matrix_D[
            getIndex(vector_matrix_A, vector_1)][
            getIndex(vector_matrix_A, vector_2)]
        return distance
    except:
        print("getDistance() exception, returning None")
        return None

def getIndex(vectorArray, vector, verbose=True):
    if isinstance(vectorArray, list) and isinstance(vector, list):
        if verbose: print('list.index()')
        return vectorArray.index(vector)
    try: 
        import numpy
        if isinstance(vectorArray, numpy.ndarray) and isinstance(vector, numpy.ndarray):
            indx, = numpy.where(numpy.all(vectorArray==vector, axis=1))
            if verbose: print('numpy.where()')
            return indx[0]
    except:
        pass # no numpy
    for indx, item in enumerate(vectorArray):
        try: 
            if vector == item:
                if verbose: print('if vector == item')
                return indx
        except: 
            if all( vector[i] == item[i] for i in range(len(vector))):
                if verbose: print('if all()')
                return indx
    return None

A = [ [i*item for i in (range(1,4))] for item in range(1,7)]
assert A == [[1, 2, 3], [2, 4, 6], [3, 6, 9], [4, 8, 12], [5, 10, 15], [6, 12, 18]]
D = []
for row in range(6):
    column = []
    for colval in range(1+6*row,7+6*row):
        column.append(colval)
    D.append(column)
assert D == [
              [ 1,  2,  3,  4,  5,  6], 
              [ 7,  8,  9, 10, 11, 12], 
              [13, 14, 15, 16, 17, 18], 
              [19, 20, 21, 22, 23, 24], 
              [25, 26, 27, 28, 29, 30], 
              [31, 32, 33, 34, 35, 36],
            ]
vector_3 = A[3]
vector_5 = A[5]
assert getDistance(   vector_3,    vector_5,    A, D) == 24

import numpy
np_A        = numpy.array(A)
np_vector_3 = numpy.array(vector_3) 
np_vector_5 = numpy.array(vector_5) 
assert getDistance(np_vector_3, np_vector_5, np_A, D) == 24

import sympy
sp_A        = sympy.Array(A)
sp_vector_3 = sympy.Array(vector_3) 
sp_vector_5 = sympy.Array(vector_5) 
assert getDistance(sp_vector_3, sp_vector_5, sp_A, D) == 24

is it work for array too? because psnr_bitrate is an array and index function do not work for it — david, Jun 28 '22 at 07:09
is it possible to get index without using for loop? because it can be time-consuming. — david, Jun 28 '22 at 07:10
get index function produces this error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() — david, Jun 28 '22 at 07:19
for an array we have to change it to: def getDistance(vector_1, vector_2, vector_matrix_A, distance_matrix_D): try: distance = distance_matrix_D[getIndex(vector_matrix_A, vector_1),getIndex(vector_matrix_A, vector_2)] return distance except: return None def getIndex(vectorArray, vector): indx,=np.where(np.all(vectorArray==vector,axis=1)) return indx — david, Jun 28 '22 at 12:05
See my updated answer for a summary of our exchange. Notice that in the numpy `where(...)` solution the return statement should be `return indx[0]` as the `indx,` gets a list out an array object and not an integer required for indexing array/list content. — Claudio, Jun 28 '22 at 17:18

How do I assign label to array rows?

1 Answers1