PyTorch pack_padded_sequence is extremely slow

Question

I am building a GRU-based architecture. Before, I was just padding the batches of sequences and passing it to the GRU. Obviously, that was introducing some small error in the results because it's not quite the 100% correct thing to do (the GRU doesn't know to stop when it reaches the padding elements).

Thus I switched out the naive batch of 2d padded sequences for pack_padded_sequence, so that I'm not passing extraneous padding items to the GRU. The training time increased by at least 3x. I am doing the pack_padded_sequence on GPU, so I need to check if perhaps it's just inefficient to do on GPU.

Any suggestions would be appreciated!

I also couldn't use ``pack_padded_sequence`` because it was way too slow even though it says that it speeds things up (maybe we are using it wrong but I don't see how). Therefore I just padded the sequences without packing them and it still worked perfectly. — Theodor Peifer, May 01 '22 at 08:08
@TheodorPeifer that's unfortunate, I remember it was slow in 2017 but that's because Soumith and team hadn't converted it to use C/C++ calls under the hood yet. I believe they had made that change and somehow it's still slow — hologram, May 02 '22 at 17:27

PyTorch pack_padded_sequence is extremely slow

0 Answers0