Time complexity of torch.topk

Question

When m and n of n.topk(m) exceed 20 million and 200,000 respectively, the sorting becomes very slow(over 3 hours). I want to know the time complexity of torch.topk and improvement measures of the sorting.

topv, topi = outline.topk(beam_size)  # beam_size = 200,000, outline: 1 × 20,000,000

are you expecting this `topk` to happen on GPU or CPU? – Shai Jan 05 '21 at 06:23 — Shai, Jan 05 '21 at 06:23
sorry, it happens on CPU and I don't have a GPU for now. – zhang xl Jan 05 '21 at 06:36 — zhang xl, Jan 05 '21 at 06:36

score 0 · Answer 1 · answered Jan 05 '21 at 07:13

I don't know how pytorch implements topk for CPU tensors. However, since you are working on CPU, you can use existing partial sorting implementations for numpy arrays.

For example, using the bottleneck.argpartition:

import bottleneck
with torch.no_grad():
  topi = bottleneck.argpartition(outline.numpy(), kth=beam_size)
topv = outline[topi]  # allow gradients to propagate through indexing

Note that the efficient implementation of bottleneck.argpartition does not sort the array and therefore you are guaranteed that topv are indeed larger than all other elements of the array, but the values in topv are not sorted and it might be the case that topv[i] < topv[i+1] for some is.

Time complexity of torch.topk

1 Answers1