-1

I have a data frame as shown below

       X       Y      Z          cluster
245 256882.0 4110945 426.50          20
246 256882.7 4110945 426.42          57
247 256883.9 4110945 429.30         114
248 256884.6 4110945 428.93         114
249 256885.4 4110945 429.50          98
250 256886.1 4110945 429.67          33

The dataframe is having 4 columns with x, y, z and clustered output. xy are the coordinates and z is the corresponding height. I clustered the entire data points using kmeans into 176 clusters. Now I want to take max z value from each cluster. for example, from cluster value 1, I need to identify the max z value and need to take the corresponding x and y values too. How can I do that?

bibinwilson
  • 348
  • 2
  • 6
  • 20
  • Please don't post images of data, they are beyond useless for copying and pasting and answering your question. – thelatemail Mar 31 '16 at 04:36
  • sorry about that. what should I do? should I upload the dataset? – bibinwilson Mar 31 '16 at 04:41
  • You could include `head(data)`, i.e. a small sample of the data. :) – Therkel Mar 31 '16 at 04:45
  • Just copy and paste a few rows you have shown in your screenshot as text, or even better, just do `dput(head(datasetname))` and paste the result here – thelatemail Mar 31 '16 at 04:46
  • structure(list(X = c(256882.03, 256882.74, 256883.91, 256884.57, 256885.37, 256886.11), Y = c(4110944.98, 4110944.96, 4110944.88, 4110944.87, 4110944.83, 4110944.81), Z = c(426.5, 426.42, 429.3, 428.93, 429.5, 429.67), fit.cluster = c(20L, 57L, 114L, 114L, 98L, 33L)), .Names = c("X", "Y", "Z", "fit.cluster"), row.names = 245:250, class = "data.frame") – bibinwilson Mar 31 '16 at 04:48
  • Beware that X,Y,Z in your data have very different *scale*. k-means does not work well on such data. – Has QUIT--Anony-Mousse Apr 02 '16 at 14:44
  • @Anony-Mousse what should I do? I'm trying to cluster the trees. The given is a LiDAR data. I did classification to the whole lidar point cloud and took the required species points only. which algo should I use to get the clustering? – bibinwilson Apr 05 '16 at 05:00
  • It's not so much a question of choosing an algoeithm, but of choosing the right **preprocessing**. – Has QUIT--Anony-Mousse Apr 05 '16 at 06:01
  • @Anony-Mousse Do you mean normalization? – bibinwilson Apr 05 '16 at 07:45
  • Not just that. *Much* more than that. These are not random numbers - you need to know what they are, and how to make them comparable. They maybe aren't X,Y,Z in a 3D space, but pitch, yaw, distance. Then you must not treat them as Euclidean coordinates. That's why your clusters probably are all over the place. – Has QUIT--Anony-Mousse Apr 05 '16 at 08:03

1 Answers1

1

You can use dplyr:

library(dplyr)

data %>%
  group_by(fit.cluster) %>%
  summarise(Z = max(Z)) %>%
  inner_join(data)

or:

df %>% 
  group_by(fit.cluster) %>%
  filter(Z == max(Z))
vitor
  • 1,240
  • 2
  • 13
  • 27
  • 1
    I was going to answer with something far less simple, this is a great way to handle this problem. But given you know the max Z within each cluster, how do you recover the X and Y associated with the Z? – Brian Albert Monroe Mar 31 '16 at 05:15
  • `group_by(fit.cluster) %>% slice(which.max(Z))` maybe? I don't use dplyr often but I think that might work to prevent the need to join back again. – thelatemail Mar 31 '16 at 05:25
  • I edited the code offering a solution that avoids joining back data. – vitor Mar 31 '16 at 05:27