I have a large distance matrix (about 3GB), looking as follows:
type street 1 street 2 street 3
coffee 2 1 19
restaurant 3 12 4
restaurant 4 3 2
bar 5 9 7
tram 6 16 1
From:
street1<-c(2,3,4,5,6)
street2<-c(1,12,3,9,16)
street3<-c(19,4,2,7,1)
type<-c("coffee","restaurant","restaurant","bar","tram")
df<-data.frame(type,street1,street2,street3)
Actual data is a few thousand columns by a few thousand rows. I want to find the first, second, third etc. closest 'types' for each column ('street'). Ideally, output would look something like this:
street closest.1 closest.2 closest.3 distclosest.1 distclosest.2 etc.
street1 coffee restaurant restaurant 2 3
street2 coffee restaurant bar 1 3
street3 tram restaurant restaurant 1 2
Hence also preserving the distances of the closest types. Further, when there is an equal distance between two types, one of them can be chosen.
I have succeeded with selecting the first closest using a code including (and by setting the first 'type' column as row names):
[apply(df,2,which.min)]
Yet I don't know how to extend this to second, third closest etc.
Naturally, I have investigated related articles. For example, I have tried to use all answers provided here:
Fastest way to find *the index* of the second (third...) highest/lowest value in vector or column
or
Fastest way to find second (third...) highest/lowest value in vector or column
But they either gave me errors or I couldn't tweak them into my preferred output (due to my limited R knowledge). Or (as indicated) because of the size of the file, it took too long to run.
Further, I tried to accomplish the same another way, by trying to replace the minimum value per column by something like 1000000, so that I could again use which.min (which is, I guess, a rather cumbersome way). I tried to use the code for this provided as answer in:
Replace maximum value of each column
But it gave me a bunch of errors. Doing it in different ways also replaced values from other columns.
Any thoughts on how to tackle this issue? Thanks so much in advance!