Find column value of second, third (etc) closest value in multiple other columns

Question

I have a large distance matrix (about 3GB), looking as follows:

type         street 1   street 2    street 3
coffee       2          1           19
restaurant   3          12          4
restaurant   4          3           2
bar          5          9           7
tram         6          16          1

From:

street1<-c(2,3,4,5,6)
street2<-c(1,12,3,9,16)
street3<-c(19,4,2,7,1)
type<-c("coffee","restaurant","restaurant","bar","tram")
df<-data.frame(type,street1,street2,street3)

Actual data is a few thousand columns by a few thousand rows. I want to find the first, second, third etc. closest 'types' for each column ('street'). Ideally, output would look something like this:

street    closest.1    closest.2    closest.3   distclosest.1 distclosest.2  etc.
street1   coffee       restaurant   restaurant  2              3
street2   coffee       restaurant   bar         1              3
street3   tram         restaurant   restaurant  1              2

Hence also preserving the distances of the closest types. Further, when there is an equal distance between two types, one of them can be chosen.

I have succeeded with selecting the first closest using a code including (and by setting the first 'type' column as row names):

[apply(df,2,which.min)]

Yet I don't know how to extend this to second, third closest etc.

Naturally, I have investigated related articles. For example, I have tried to use all answers provided here:

Fastest way to find *the index* of the second (third...) highest/lowest value in vector or column

or

Fastest way to find second (third...) highest/lowest value in vector or column

But they either gave me errors or I couldn't tweak them into my preferred output (due to my limited R knowledge). Or (as indicated) because of the size of the file, it took too long to run.

Further, I tried to accomplish the same another way, by trying to replace the minimum value per column by something like 1000000, so that I could again use which.min (which is, I guess, a rather cumbersome way). I tried to use the code for this provided as answer in:

Replace maximum value of each column

But it gave me a bunch of errors. Doing it in different ways also replaced values from other columns.

Any thoughts on how to tackle this issue? Thanks so much in advance!

Perhaps you could "delete" the found min from your "working data.frame", to get the second nearest and iterate over that? — Christoph, Jul 14 '16 at 11:05
That would be a nice possibility, yet would you have a suggestion for an approach that would work for a large file? (hence, no loops etc) — zoekdestep, Jul 14 '16 at 12:36
Perhaps somebody has a solution when you supply a reproducible example. — Christoph, Jul 14 '16 at 12:45
Please tell me what you are missing from above - in what way is it not a reproducible example? — zoekdestep, Jul 14 '16 at 12:48

Find column value of second, third (etc) closest value in multiple other columns

0 Answers0