1

I have an array A with size 600x6 that each row is a vector and I want to calculate the distance of each row from all other rows of the array. calculating the distance ( BD distance) is easy and I can calculate all the distances and put them in a matrix D(600x600), but during my code, I have just the value of the row not the index of it and so I cannot use D to find the distance quickly. so I have to calculate the distance again. my question is it a way to assign a label or index to each row of A during the code? for example, I have A1 and A2 so I very fast find out that I have to extract D1,2 for distance. I am not very familiar with python. Could you please tell me how can I do this without calculating the distance each time? as you can see in the following code, the centroid during the next step of the code will change. so I have to calculate the BD distance again which is time-consuming. but if I could save the index of centroid I could extract the distance from my distance matrix very fast.

def kmeans_BD(psnr_bitrate,K,centroid):
    m=psnr_bitrate.shape[0]#number of samples
    n=psnr_bitrate.shape[1]#number of bitrate
    
    # creating an empty array
    BD=np.zeros((m,K))
    #weight of BD_rate
    wr=0.5
    #weight of BD_Q
    wq=0.5
    n_itr=10
    # finding distance between for each centroid
    for itr in range(n_itr):
        for k in range(K):
            for i in range(len(psnr_bitrate)):
                BD_R=bd_rate(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_R==-2:
                    BD_R=np.inf
                BD_Q=bd_PSNR(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_Q==-2:
                    BD_Q=np.inf
                BD[i,k]=np.abs(wr*BD_R+wq*BD_Q)
david
  • 1,255
  • 4
  • 13
  • 26
  • "but during my code" ... please add your code – YesThatIsMyName Jun 27 '22 at 10:07
  • I am trying to implement Kmeans with my distance metric. – david Jun 27 '22 at 10:21
  • could you please explain more or put a sample code? – david Jun 27 '22 at 10:52
  • centroid is not a new array. it is selected from the psnr_bitrate that each time based on the distance a group of them was selected as a new centroid. for this problem, I do not know how can I find the index of the centroid in the main psnr_bitrate array. – david Jun 27 '22 at 10:57

1 Answers1

2

This answer is an updated one implementing all the appreciated remarks made in the comments about the problems with implementing the before provided code.

The getIndex() function is the core of the provided solution requested in the question and should now work with all possible array types (Python list, numpy ndarray, sympy Array, ...). It uses different methods for getting the array index while given a value for an array item. If no for the datatype specialized way is available the index will be found using a loop with Python all() function.

To demonstrate the functionality the code comes with a getDistance() function and an example of array data. The assert statements in the code assure that the code works as expected:

def getDistance(vector_1, vector_2, vector_matrix_A, distance_matrix_D):
    try: 
        distance = distance_matrix_D[
            getIndex(vector_matrix_A, vector_1)][
            getIndex(vector_matrix_A, vector_2)]
        return distance
    except:
        print("getDistance() exception, returning None")
        return None

def getIndex(vectorArray, vector, verbose=True):
    if isinstance(vectorArray, list) and isinstance(vector, list):
        if verbose: print('list.index()')
        return vectorArray.index(vector)
    try: 
        import numpy
        if isinstance(vectorArray, numpy.ndarray) and isinstance(vector, numpy.ndarray):
            indx, = numpy.where(numpy.all(vectorArray==vector, axis=1))
            if verbose: print('numpy.where()')
            return indx[0]
    except:
        pass # no numpy
    for indx, item in enumerate(vectorArray):
        try: 
            if vector == item:
                if verbose: print('if vector == item')
                return indx
        except: 
            if all( vector[i] == item[i] for i in range(len(vector))):
                if verbose: print('if all()')
                return indx
    return None

A = [ [i*item for i in (range(1,4))] for item in range(1,7)]
assert A == [[1, 2, 3], [2, 4, 6], [3, 6, 9], [4, 8, 12], [5, 10, 15], [6, 12, 18]]
D = []
for row in range(6):
    column = []
    for colval in range(1+6*row,7+6*row):
        column.append(colval)
    D.append(column)
assert D == [
              [ 1,  2,  3,  4,  5,  6], 
              [ 7,  8,  9, 10, 11, 12], 
              [13, 14, 15, 16, 17, 18], 
              [19, 20, 21, 22, 23, 24], 
              [25, 26, 27, 28, 29, 30], 
              [31, 32, 33, 34, 35, 36],
            ]
vector_3 = A[3]
vector_5 = A[5]
assert getDistance(   vector_3,    vector_5,    A, D) == 24

import numpy
np_A        = numpy.array(A)
np_vector_3 = numpy.array(vector_3) 
np_vector_5 = numpy.array(vector_5) 
assert getDistance(np_vector_3, np_vector_5, np_A, D) == 24

import sympy
sp_A        = sympy.Array(A)
sp_vector_3 = sympy.Array(vector_3) 
sp_vector_5 = sympy.Array(vector_5) 
assert getDistance(sp_vector_3, sp_vector_5, sp_A, D) == 24
Claudio
  • 7,474
  • 3
  • 18
  • 48
  • is it work for array too? because psnr_bitrate is an array and index function do not work for it – david Jun 28 '22 at 07:09
  • is it possible to get index without using for loop? because it can be time-consuming. – david Jun 28 '22 at 07:10
  • get index function produces this error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() – david Jun 28 '22 at 07:19
  • for an array we have to change it to: def getDistance(vector_1, vector_2, vector_matrix_A, distance_matrix_D): try: distance = distance_matrix_D[getIndex(vector_matrix_A, vector_1),getIndex(vector_matrix_A, vector_2)] return distance except: return None def getIndex(vectorArray, vector): indx,=np.where(np.all(vectorArray==vector,axis=1)) return indx – david Jun 28 '22 at 12:05
  • See my updated answer for a summary of our exchange. Notice that in the numpy `where(...)` solution the return statement should be `return indx[0]` as the `indx,` gets a list out an array object and not an integer required for indexing array/list content. – Claudio Jun 28 '22 at 17:18