1

How would I go about finding the middle option or "average" in the example below? I do not want to create new values by taking the mean of all columns and taking the median also does not work in this case. I need to be able to figure out that the blue one (col_5) is the "middle". Any tips? Thanks!

col_1 <- c(0,32,34,36,37,41,43,44,47,48,50)
col_2 <- c(0,3,4,5,6,7,9,14,16,18,20)
col_3 <- c(0,22,23,25,28,31,32,35,38,39,41)
col_4 <- c(0,1,2,3,5,6,8,9,11,13,15)
col_5 <- c(0,2,5,9,11,15,25,33,36,37,38)


df1 <- data.frame(col_1, col_2, col_3, col_4, col_5)

plot(df1$col_1, type ="l")
lines(df1$col_2)
lines(df1$col_3)
lines(df1$col_4)
lines(df1$col_5, col='blue')

enter image description here

neòinean
  • 39
  • 7
  • I realize my answer goes against what you "do not want", but if that isnt the approach you may need to specify what the complexities of your problem are a bit more in depth and why median isn't actually applicable here. My assumption is your real data has far more skewness? – Carl Boneri Apr 17 '21 at 21:03
  • @CarlBoneri Thanks for your reply! My data contains xy coordinates. Each column is essentially a trajectory and I wish to find the middle route (but cannot deviate/go off track)... I will go ahead and try if your solution works for this! – neòinean Apr 17 '21 at 21:47
  • The problem is that you will get different medians for the longitude and latitude, resulting in two options – neòinean Apr 17 '21 at 22:13
  • Haha that's absolutely the kind of info you have to include in your question. Question is do those coordinates represent points on the globe...? The word "trajectory" is throwing me off... But we might be talking about totally different math needed here – Carl Boneri Apr 18 '21 at 01:28
  • Wait finding the middle route, wouldn't that imply most direct from point A -> B? So we don't want "middle route", we want straightest line no? Ie: least amount of variance – Carl Boneri Apr 18 '21 at 01:29
  • 1
    @CarlBoneri Sorry for the confusion! I managed to figure it out thanks to your answer and this [answer](https://stackoverflow.com/a/32620360/15270969). Took the mean and then found the nearest neighbour. – neòinean Apr 18 '21 at 18:27
  • That's awesome! Glad to hear you got it worked out and always love it when I'm able to put a puzzle together with a few different answers, myself. Good stuff! – Carl Boneri Apr 18 '21 at 22:05

1 Answers1

1

You'll need to tweak how I arrived at returning the "middle" result, but basically from your question I take your problem as:

For all columns in a table, find the average, then determine which of those is the 'middle' or median

So to accomplish this I suggest iterating over the columns to calculate the average the good ole fashioned way, using sum(x) / length(x) essentially:

avgs <- sapply(df1, function(i){
    sum(i) / nrow(df1)
})

> avgs
      col_1       col_2       col_3       col_4       col_5 
37.45454545  9.27272727 28.54545455  6.63636364 19.18181818 

# Just giving you a visual here
> sort(avgs)
      col_4       col_2       col_5       col_3       col_1 
 6.63636364  9.27272727 19.18181818 28.54545455 37.45454545 

So now we just want to know which value is our middle or median

> avgs[which(avgs == median(avgs))]
     col_5 
19.1818182 

# OR if you just need the name:

> names(which(avgs == median(avgs)))
[1] "col_5"
Carl Boneri
  • 2,632
  • 1
  • 13
  • 15