0

How to get the coordinates of the big rectangles that line on the diagonal.

For example yellow [0,615], [615,1438], [1438,1526]

import numpy as np; 
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

df = pd.DataFrame(array) # array is image numpy
df.shape #(1526, 360)
s = cosine_similarity(df) #(1526, 1526)
plt.matshow(s)

matrix cosine distance

i try get peaks in first row, but have noise information

speak = 1-s[0]

peaks, _ = find_peaks(speak, distance=160, height=0.1)
print(peaks, len(peaks))
np.diff(peaks)

plt.plot(speak)
plt.plot(peaks, speak[peaks], "x")
plt.show()

one row histogram

Update, add another example And upload to colab full script https://colab.research.google.com/drive/1hyDIDs-QjLjD2mVIX4nNOXOcvCZY4O2c?usp=sharing

noise example example2

rustam s
  • 163
  • 1
  • 9

3 Answers3

1

Use np.diag(df) to get a list of diagonal elements. Check when value crosses threshold if the color in your screenshot stands for below/above some value, probably zero.

Aramakus
  • 1,910
  • 2
  • 11
  • 22
  • need diagonal of 's = cosine_similarity(df)', then it turns out there will always be values equal to 1 – rustam s Jul 25 '20 at 09:41
  • Then do a loop through the `np.diag(s)`. Without your values I do not know what the color in your plot mean. – Aramakus Jul 25 '20 at 09:45
  • @Aramkus yes you are right, but on the distance matrix the diagonal values always correspond to the element itself, there is no useful information there. It is necessary to look exactly at the area of large squares lying on the diagonal. I upload dataframe to csv dropmefiles.com/CxWey – rustam s Jul 25 '20 at 09:48
  • If matplotlib plots these areas the same color, the values within them are rather similar, so diagonal should contain all the information. You can try to find points where your diagonal array intersect a rolling average of itself. In such places values in `s` rapidly change. – Aramakus Jul 25 '20 at 09:58
  • @Aramkus I gave only one example, colors do not really matter, and the squares can be different ... I'm looking for an algorithm maybe something like classification or clustering – rustam s Jul 25 '20 at 10:06
  • @Aramkus add another example to post – rustam s Jul 25 '20 at 10:10
  • @rustams What do you expect output in 2nd and 3rd examples? – Rahul Vishwakarma Jul 25 '20 at 10:17
  • @RahulVishwakarma in 2 [0,340][340,350], in 3 [0,420],[420,1000] maybe – rustam s Jul 25 '20 at 10:42
  • Check my answer, and change the filter value, you will get your output. @rustams – Rahul Vishwakarma Jul 25 '20 at 10:45
  • @rustams Add value of s[600:630, 600:630], s[1430:1460, 1430:1460] in your question to help you get the factor value(of my answer), if you are unable to get it. – Rahul Vishwakarma Jul 25 '20 at 10:48
1

All the diagonal elements of cosine_similarity are same. So you should look for changes in nearby values.

You could try this:

factor = 1.01
look_nearby = 1

changes = []
for i in range(look_nearby, s.shape[0]-look_nearby):
    if s[i, i+look_nearby] > factor*s[i, i-look_nearby] or factor*s[i, i+look_nearby] < s[i, i-look_nearby]:
        changes.append(i)
        
print(changes)

Set the factor value according to your preference (as you do not want (1200, 1200) in the output of 1st image) and according to the values of s.

Rahul Vishwakarma
  • 1,446
  • 2
  • 7
  • 22
  • ```[607, 608, 612, 613, 614, 615, 616, 617, 618, 1380, 1381, 1382, 1383, 1385, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1440, 1441, 1442, 1443, 1444, 1445, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1473, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1488.......1519, 1520, 1521, 1522, 1524]``` thanks a lot, but maybe I could not put the question correctly ... I need large intervals, as bright as possible and not all – rustam s Jul 25 '20 at 10:46
  • Yes, to get large intervals, you should change the factor value, try increasing it to 2 or 1.5 or other values or decrease it to 0.5 or other. I cannot tell the factor value without seeing actual data of s – Rahul Vishwakarma Jul 25 '20 at 10:50
  • Upload to colad full example https://colab.research.google.com/drive/1hyDIDs-QjLjD2mVIX4nNOXOcvCZY4O2c?usp=sharing – rustam s Jul 25 '20 at 11:25
  • This Colab notebook runs fine, what's the error? And what are you expecting as output for the image – Rahul Vishwakarma Jul 25 '20 at 11:55
  • no error, it works almost well ... I don't like the fact that I now use the first row to find the peaks and not the diagonal histogram. You can make an average, histogram on the first, middle, last row and get the average peaks. But it seems to me not optimal. – rustam s Jul 25 '20 at 12:00
  • 1
    [Colab link](https://colab.research.google.com/drive/1hyDIDs-QjLjD2mVIX4nNOXOcvCZY4O2c?usp=sharing) I have added, how could you use the code given in answer for various factor values. Since I don't get, what do you want to do with that or what do you want the output (provide some proper range that this rectangle is big and this is less), I can't help much. – Rahul Vishwakarma Jul 25 '20 at 12:15
0

Solve with DBSCAN clustering (find in same question DBSCAN for clustering of geographic location data)

from sklearn.cluster import DBSCAN

clustering = DBSCAN(eps=.5, min_samples=10).fit_predict(s)

peaks = np.where(clustering[:-1] != clustering[1:])[0]
rustam s
  • 163
  • 1
  • 9