You didn't describe where your vectors come from, nor what use you will put mean
and median
to. Here are some observations about the general case. Limited ranges, error tolerance, and discrete values may admit of a more efficient approach.
The mean
distance between M points sounds quadratic, O(M^2). But M / N is 10, fairly small, and N is huge, so the data probably resembles a hairy sphere in 1e3-space. Computing centroid of M points, and then computing M distances to centroid, might turn out to be useful in your problem domain, hard to tell.
The minimum
distance among M points is more interesting. Choose a small number of pairs at random, say 100, compute their distance, and take half the minimum as an estimate of the global minimum distance. (Validate by comparing to the next few smallest distances, if desired.) Now use spatial UB-tree to model each point as a positive integer. This involves finding N minima for M x N values, adding constants so min becomes zero, scaling so estimated global min distance corresponds to at least 1.0, and then truncating to integer.
With these transformed vectors in hand, we're ready to turn them into a UB-tree representation that we can sort, and then do nearest neighbor spatial queries on the sorted values. For each point compute an integer. Shift the low-order bit of each dimension's value into the result, then iterate. Continue iterating over all dimensions until non-zero bits have all been consumed and appear in the result, and proceed to the next point. Numerically sort the integer result values, yielding a data structure similar to a PostGIS index.
Now you have a discretized representation that supports reasonably efficient queries for nearest neighbors (though admittedly N=1e3 is inconveniently large). After finding two or more coarse-grained nearby neighbors, you can query the original vector representation to obtain high-resolution distances between them, for finer discrimination. If your data distribution turns out to have a large fraction of points that discretize to being off by single bit from nearest neighbor, e.g. location of oxygen atoms where each has a buddy, then increase the global min distance estimate so the low order bits offer adequate discrimination.
A similar discretization approach would be appropriately scaling e.g. 2-dimensional inputs and marking an initially empty grid, then scanning immediate neighborhoods. This relies on global min being within a "small" neighborhood, due to appropriate scaling. In your case you would be marking an N-dimensional grid.