0

that's my data. (dput)

mydata=structure(list(ndvi_num75up = c(33L, 33L, 100L, 48L, 36L, 36L, 
37L, 36L, 27L, 35L, 52L, 82L, 41L, 40L, 45L, 45L, 31L, 31L, 33L, 
33L, 50L, 45L, 38L, 29L, 56L), ndvi_num75down = c(102L, 102L, 
108L, 117L, 107L, 106L, 107L, 106L, 94L, 93L, 111L, 113L, 108L, 
107L, 108L, 108L, 125L, 125L, 114L, 114L, 110L, 110L, 103L, 99L, 
104L), ndvi_num85up = c(57L, 57L, 56L, 72L, 45L, 44L, 51L, 44L, 
45L, 41L, 59L, 87L, 46L, 45L, 59L, 59L, 96L, 96L, 53L, 53L, 54L, 
102L, 45L, 51L, 61L), ndvi_num85down = c(95L, 95L, 92L, 114L, 
103L, 103L, 104L, 103L, 89L, 89L, 106L, 111L, 105L, 96L, 103L, 
103L, 114L, 114L, 112L, 112L, 95L, 104L, 93L, 94L, 99L), ndvi_n_maxvi = c(73L, 
73L, 104L, 90L, 74L, 73L, 76L, 73L, 63L, 65L, 83L, 98L, 75L, 
72L, 80L, 80L, 88L, 88L, 81L, 81L, 77L, 75L, 69L, 68L, 80L), 
    ndvi_num50up = c(19L, 19L, 20L, 17L, 0L, 17L, 0L, 17L, 0L, 
    18L, 24L, 29L, 19L, 25L, 25L, 25L, 0L, 0L, 0L, 0L, 25L, 24L, 
    16L, 18L, 37L), ndvi_num50down = c(118L, 118L, 119L, 133L, 
    113L, 112L, 115L, 112L, 110L, 109L, 131L, 120L, 114L, 112L, 
    117L, 117L, 0L, 0L, 122L, 122L, 116L, 116L, 110L, 109L, 116L
    ), ndvi_num35up = c(0L, 0L, 12L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 1L, 15L, 18L, 18L, 0L, 0L, 0L, 0L, 0L, 5L, 8L, 
    0L, 19L), ndvi_num35down = c(131L, 131L, 129L, 0L, 119L, 
    117L, 124L, 117L, 117L, 113L, 0L, 0L, 121L, 116L, 123L, 123L, 
    0L, 0L, 0L, 0L, 125L, 123L, 115L, 129L, 124L), ndvi_max = c(0.499, 
    0.499, 0.56, 0.437, 0.834, 0.845, 0.785, 0.845, 0.705, 0.819, 
    0.592, 0.671, 0.674, 0.853, 0.792, 0.792, 0.47, 0.47, 0.578, 
    0.578, 0.715, 0.758, 0.686, 0.638, 0.836)), row.names = c(NA, 
25L), class = "data.frame")

I have 10 vars for clustering

'data.frame':   4926 obs. of  10 variables:
 $ ndvi_num75up  : int  33 33 100 48 36 36 37 36 27 35 ...
 $ ndvi_num75down: int  102 102 108 117 107 106 107 106 94 93 ...
 $ ndvi_num85up  : int  57 57 56 72 45 44 51 44 45 41 ...
 $ ndvi_num85down: int  95 95 92 114 103 103 104 103 89 89 ...
 $ ndvi_n_maxvi  : int  73 73 104 90 74 73 76 73 63 65 ...
 $ ndvi_num50up  : int  19 19 20 17 0 17 0 17 0 18 ...
 $ ndvi_num50down: int  118 118 119 133 113 112 115 112 110 109 ...
 $ ndvi_num35up  : int  0 0 12 0 0 0 0 0 0 0 ...
 $ ndvi_num35down: int  131 131 129 0 119 117 124 117 117 113 ...
 $ ndvi_max      : num  0.499 0.499 0.56 0.437 0.834 0.845 0.785 0.845 0.705 0.819 .

.

But when i perform DBSCA

library(dbscan)
dbscan_res <- dbscan(mydata, eps = 0.15, minPts = 5)
str(dbscan_res)

and as result that

The clustering contains 0 cluster(s) and 4926 noise points.

   0 
4926 

How perform dbscan clustering for all 10 variables with indication of the observation belonging to the cluster and why it didn't find clusters?

I mean the desired output

ndvi_num75up    ndvi_num75down  ndvi_num85up    ndvi_num85down  ndvi_n_maxvi    ndvi_num50up    ndvi_num50down  ndvi_num35up    ndvi_num35down  ndvi_max    cluster
33  102 57  95  73  19  118 0   131 0,499   3
33  102 57  95  73  19  118 0   131 0,499   3
100 108 56  92  104 20  119 12  129 0,56    2
48  117 72  114 90  17  133 0   0   0,437   4
36  107 45  103 74  0   113 0   119 0,834   3
36  106 44  103 73  17  112 0   117 0,845   3
37  107 51  104 76  0   115 0   124 0,785   3
36  106 44  103 73  17  112 0   117 0,845   3
27  94  45  89  63  0   110 0   117 0,705   1
35  93  41  89  65  18  109 0   113 0,819   1

(this result using k-mean, but i need dbscan ,cause it self choose needed count of clusters) Thank you.

psysky
  • 3,037
  • 5
  • 28
  • 64
  • can you specify what exactly you want to achieve? as far as i can see (replicate with your dput data) there are no clusters containing 5 points. the first 5 point cluster appears (again in your dput data) if you raise your radius (eps) to 16 instead of 0.15. as far as i understand your post, the kmeans results agree with this since there are no clusters >= 5. – D.J Dec 11 '21 at 12:34
  • Have you read the manual page for `dbscan`? It recommends a value of minPts = to the dimensionality of the data plus 1 which would be 11 for your data and examining the plot of `kNNdistplot(mydata, 10)`. – dcarlson Dec 12 '21 at 00:19
  • @D.J, yes it works ,thanks. How can i choose optimal radius? – psysky Dec 12 '21 at 09:24
  • @dcarlson, i really didn't know. thank you too. Its good. – psysky Dec 12 '21 at 09:25
  • you have to find out how large a radius makes sense for your data. you can either decide that by trial & error or look [here](https://stackoverflow.com/questions/12893492/choosing-eps-and-minpts-for-dbscan-r) – D.J Dec 12 '21 at 10:00
  • @D.J, thank you. Goog topic – psysky Dec 12 '21 at 12:18

0 Answers0