Questions tagged [pdist]

pdist computes pairwise distance between pairs of objects in m-by-n data matrix in MATLAB.

pdist computes Euclidean distance between pairs of objects in m-by-n data matrix in .

Syntax

D = pdist(X)
D = pdist(X,distance)

Where, X is m-by-n data matrix.

  1. In D = pdist(X), the calculated distance is Euclidean distance.
  2. In D = pdist(X,distance) , the method can be specified by distance, which can be any one of the following
'euclidean'  
'seuclidean'  
'cityblock'  
'minkowski'  
'chebychev'  
'mahalanobis'    
'cosine'  
'correlation'  
'spearman'  
'hamming'     
'jaccard'  

or any custom distance function of form d2 = distfun(XI,XJ)

You should use this tag if your question is related to the use of pdist or any custom distance functions associated with it.

48 questions
10
votes
4 answers

String Distance Matrix in Python using pdist

How to calculate Jaro Winkler distance matrix of strings in Python? I have a large array of hand-entered strings (names and record numbers) and I'm trying to find duplicates in the list, including duplicates that may have slight variations in…
Mark W
  • 103
  • 1
  • 6
7
votes
1 answer

scipy pdist() on a pandas DataFrame

I have a large dataframe (e.g. 15k objects), where each row is an object and the columns are the numeric object features. It is in the form: df = pd.DataFrame({ 'A' : [0, 0, 1], 'B' : [2, 3, 4], 'C' : [5, 0,…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
6
votes
4 answers

Sum of distances from a point to all other points

I have two lists available_points = [[2,3], [4,5], [1,2], [6,8], [5,9], [51,35]] and solution = [[3,5], [2,1]] I'm trying to pop a point in available_points and append it to solution for which the sum of euclidean distances from that point, to all…
jfran
  • 143
  • 1
  • 7
5
votes
3 answers

Calculate two dimensional pairwise distance on a large numpy three dimensional array

I have a numpy array of 3 million points in the form of [pt_id, x, y, z]. The goal is to return all pairs of points that have an Euclidean distance two numbers min_d and max_d. The Euclidean distance is between x and y and not on the z. However, I'd…
dassouki
  • 6,286
  • 7
  • 51
  • 81
5
votes
2 answers

python numpy pairwise edit-distance

So, I have a numpy array of strings, and I want to calculate the pairwise edit-distance between each pair of elements using this function: scipy.spatial.distance.pdist from…
Vahid Mirjalili
  • 6,211
  • 15
  • 57
  • 80
4
votes
3 answers

MATLAB pdist function

I am using the pdist command to find the distance between x and y coordinates stored in a matrix. X = [100 100; 0 100; 100 0; 500 400; 300 600;]; D = pdist(X,'euclidean') Which returns a 15 element vector.…
James
  • 253
  • 2
  • 8
  • 11
3
votes
1 answer

Interpretation of cosine similarity and jaccard similarity (similarity of histograms)

Introduction I would like to assess the similarity between two "bin counts" arrays (related to two histograms), by using the Matlab "pdist2" function: % Input bin_counts_a = [689 430 311 135 66 67 99 23 37 19 8 4 …
limone
  • 279
  • 2
  • 9
3
votes
2 answers

How to find pairs of values greater than a certain cosine distance value?

I have an array: [[ 0.32730174 -0.1436172 -0.3355202 -0.2982458 ] [ 0.50490916 -0.33826587 0.4315952 0.4850834 ] [-0.18594801 -0.06028342 -0.24817085 -0.41029227] [-0.22551994 0.47151482 -0.39798814 -0.14978702] [-0.3315491 0.05832376…
M. ahmed
  • 53
  • 2
  • 11
3
votes
1 answer

Minimum distance between 2 unequal sets of points

I want to be able to find the minimum distance between 2 sets of points in the xy-plane. Let's assume the first set of points, set A, has 9 points, and the second set of points, set B, has 3 points. I want to find the minimum total distance that…
Bobby Stiller
  • 137
  • 1
  • 1
  • 9
3
votes
3 answers

Speed-efficient classification for complex vectors in MATLAB

I am trying to optimize this piece of code and get rid of the nested loop implemented. I am finding difficulties in applying a matrix to pdist function For example, 1+j // -1+j // -1+j // -1-j are the initial points and i am trying to detect…
3
votes
1 answer

Is there a faster/compact way of obtaining the indices from squareform? (Matlab)

everyone. I have a 3-dimensional data point matrix called "data", which has a dimension of N*3. Right now, I am trying to get two values: First, the indices "m" and "n" of a distance matrix "Dist", where Dist = squareform(pdist(data)); Such…
JimC
  • 33
  • 2
2
votes
1 answer

scipy.pdist() returns NaN values

I'm trying to cluster time series. The intra-cluster elements have same shapes but different scales. Therefore, I would like to use a correlation measure as metric for clustering. I'm trying correlation or pearson coefficient distance (any…
user2614596
  • 630
  • 2
  • 11
  • 30
2
votes
1 answer

how to compute pairwise distance among series of different length (na inside) efficiently?

resuming this question: Compute the pairwise distance in scipy with missing values test case: I want to compute the pairwise distance of series with different length taht are grouped together and I have to do it in the most efficient possible way…
Asher11
  • 1,295
  • 2
  • 15
  • 31
1
vote
0 answers

Function pdist and knnsearch

I have implemented this KNN inner distance equation for the output of the knnsearch function. Example Main dataset is `{ 1 ;2; 5; 10; 20 }` Query dataset is `{ 3 }` …
user20
  • 11
  • 2
1
vote
1 answer

Create adjacency matrix by comparing all rows using a subset of string columns

I have a Pandas dataframe with three columns, id (a unique identifier) and then three string columns event_one, event_two and event_three, as follows: test_df.head() id event_one event_two event_three 0 N1 'aaa' 'abc' 'xyz' 1 …
1
2 3 4