I have a table of data, where I've labeled the rows based on a cluster they fall into, as well as calculated the average of the rows column values. I would like to select the median row for each cluster.
For example sake, just looking at one, I would like to use:
median(as.numeric(as.vector(subset(df,df$cluster == i )$avg)))
I can see that
> as.numeric(as.vector(subset(df,df$cluster == i )$avg))
[1] 48.11111111 47.77777778 49.44444444 49.33333333 47.55555556 46.55555556 47.44444444 47.11111111 45.66666667 45.44444444
And yet, the median is
> median(as.numeric(as.vector(subset(df,df$cluster == i )$avg)))
[1] 47.5
I would like to find the median record, by matching the median returned with the average in the column, but that isn't possible with this return.
I've found some documentation and questions on rounding with the mean function, but that doesn't seem to apply to this unfortunately.
I could also limit the data decimal places, but some records will be too close, that duplicates will be common if rounded to one decimal.