4

As I understand, you need to call tensor.contiguous() explicitly whenever some function or module needs a contiguous tensor. Otherwise you get exceptions like:

RuntimeError: invalid argument 1: input is not contiguous at .../src/torch/lib/TH/generic/THTensor.c:231

(E.g. via.)

What functions or modules require contiguous input? Is this documented?

Or phrased differently, what are situations where you need to call contiguous?

E.g. Conv1d, does it require contiguous input? The documentation does not mention this. When the documentation does not mention this, this would always imply that it does not require contiguous input?

(I remember in Theano, any op getting some non-contiguous input, which required it to be contiguous, would just convert it automatically.)

ItsMe
  • 395
  • 2
  • 13
Albert
  • 65,406
  • 61
  • 242
  • 386

3 Answers3

2

After additional digging under the hood through source_code, it seems that view is the only function that explicitly causes an exception when a non-contiguous input is passed.

One would expect any operation using Tensor Views to have the potential of failing with non-contiguous input. In reality, it seems to be the case that most or all of these functions are:

(a.) implemented with support for non-contiguous blocks (see example below), i.e. the tensor iterators can handle multiple pointers to the various chunks of the data in memory, perhaps at the expense of performance, or else

(b.) a call to .contiguous() wraps the operation (One such example shown here for torch.tensor.diagflat()). reshape is essentially the contiguous()-wrapped form of view.

By extension, it seems, the main benefit of view over reshape would be the explicit Exception when tensors are unexpectedly non-contiguous versus code silently handling this discrepancy at the cost of performance.

This conclusion is based on:

  1. Testing of all Tensor View ops with non-contiguous inputs.
  2. Source code analysis of other non-Tensor View functions of interest (e.g. Conv1D, which includes calls to contiguous as necessary in all non-trivial input cases).
  3. Inference from pytorch's design philosophy as a simple, at times slow, easy-to-use language.
  4. Cross-posting on Pytorch Discuss.
  5. Extensive review of web reported errors involving non-contiguous errors, all of which revolve around problematic calls to view.

I did not comprehensively test all pytorch functions, as there are thousands.

EXAMPLE OF (a.):

import torch
import numpy
import time

# allocation 
start = time.time()
test = torch.rand([10000,1000,100])
torch.cuda.synchronize()
end = time.time()
print("Allocation took {} sec. Data is at address {}. Contiguous: 
{}".format(end - 
start,test.storage().data_ptr(),test.is_contiguous()))

# view of a contiguous tensor
start = time.time()
test.view(-1)
torch.cuda.synchronize()
end = time.time()
print("view() took {} sec. Data is at address {}. Contiguous: 
{}".format(end - 
start,test.storage().data_ptr(),test.is_contiguous()))


# diagonal() on a contiguous tensor
start = time.time()
test.diagonal()
torch.cuda.synchronize()
end = time.time()
print("diagonal() took {} sec. Data is at address {}. Contiguous: 
{}".format(end - 
start,test.storage().data_ptr(),test.is_contiguous()))


# Diagonal and a few tensor view ops on a non-contiguous tensor
test = test[::2,::2,::2]    # indexing is a Tensor View op 
resulting in a non-contiguous output
print(test.is_contiguous()) # False
start = time.time()
test = test.unsqueeze(-1).expand([test.shape[0],test.shape[1],test.shape[2],100]).diagonal()
torch.cuda.synchronize()
end = time.time()
print("non-contiguous tensor ops() took {} sec. Data is at 
address {}. Contiguous: {}".format(end - 
start,test.storage().data_ptr(),test.is_contiguous()))

# reshape, which requires a tensor copy operation to new memory
start = time.time()
test = test.reshape(-1) + 1.0
torch.cuda.synchronize()
end = time.time()
print("reshape() took {} sec. Data is at address {}. Contiguous: {}".format(end - start,test.storage().data_ptr(),test.is_contiguous()))

The following is output:

Allocation took 4.269254922866821 sec. Data is at address 139863636672576. Contiguous: True
view() took 0.0002810955047607422 sec. Data is at address 139863636672576. Contiguous: True
diagonal() took 6.532669067382812e-05 sec. Data is at address 139863636672576. Contiguous: True
False
non-contiguous tensor ops() took 0.00011277198791503906 sec. Data is at address 139863636672576. Contiguous: False
reshape() took 0.13828253746032715 sec. Data is at address 94781254337664. Contiguous: True

A few tensor view operations in block 4 are performed on a non-contiguous input tensor. The operation runs without error, maintains the data in the same memory addresses, and runs relatively faster than an operation requiring a copy to new memory addresses (such as reshape in block 5). Thus, it seems these operations are implemented in a way that handles non-contiguous inputs without requiring a data copy.

DerekG
  • 3,555
  • 1
  • 11
  • 21
  • I'm asking about which ops do not support non-contiguous inputs in my question. So you gave `view` and `reshape` as examples. What else? This is my question here. I'm not asking about ops not supporting multi-dimension inputs or so. This is all well documented. – Albert Nov 05 '21 at 14:51
  • The answer is anything that uses a Tensor View – DerekG Nov 05 '21 at 17:52
  • Also @Albert, `reshape` explicitly DOES NOT require a contiguous input, it is a lazy copy operation that performs the same function as `view` if possible, otherwise copies data to a new tensor in memory – DerekG Nov 05 '21 at 18:01
  • I don't understand this answer. I'm asking about a list of operations which require contiguous input. Earlier you wrote that `view` requires contiguous input. So that is not true then? Now you changed the answer. But it doesn't really answer the question. What operations, functions or modules require contiguous input? Can you give some examples maybe? – Albert Nov 05 '21 at 21:26
  • Pytorch explicitly provides a list of operations that use Tensor Views, and these operations require contiguous input. The full list of functions is given in the link at the beginning of the answer. The remaining part of the answer attempts to explain the need for contiguous tensors/tensor views for those not familiar with these aspects of Pytorch – DerekG Nov 07 '21 at 18:34
  • But I don't ask about a list of ops that use tensor views. I ask about a list of ops which require contiguous inputs. I don't see how that is the same. And in the link, the list contains for example `diagonal`, `transpose` and other, but none of these require contiguous inputs, so it's not what I'm asking. I ask about ops which require contiguous input (which would throw an error when you pass it a non-contiguous input). – Albert Nov 08 '21 at 08:35
  • Ok, I did some more digging and testing, and modified my answer to reflect this. – DerekG Nov 08 '21 at 19:23
  • Ah thanks. So you only found `view`? Did you check only through your given list, or really through all PyTorch functions? I saw e.g. other code where people used `contiguous` before `Conv1d` (I think for the kernel, I don't remember), so I wondered if that also needs contiguous input. And maybe others. Maybe functions which wrap CuDNN functions internally? If you did not check all possible functions, you should reflect that in your answer, that there are potentially other functions which require contiguous input as well. – Albert Nov 09 '21 at 08:06
  • To answer my rephrased question (what are situations where you need to call `contiguous`), it means basically never? (When you also consider to just use `reshape` instead of `view`.) – Albert Nov 09 '21 at 08:07
  • That is the gist of my answer, though obviously `reshape` and `view` do have slightly different characteristics and use cases. Also edited my answer to adress previous comment. – DerekG Nov 09 '21 at 14:01
  • Actually, to narrow that, there are still clear use cases for `contiguous()`, e.g. a loop that performs Tensor View ops on the same large Tensor. One call to `contiguous() outside of the loop saves many calls to contiguous() inside, each of which copies data to a new contiguous set of memory addresses. But technically no, you don't NEED to call contiguous, you simply will lose performance – DerekG Nov 09 '21 at 14:11
0

From the pytorch documentation: contiguous() → Tensor. Returns a contiguous tensor containing the same data as self tensor. If self tensor is contiguous, this function returns the self tensor.

Unknown
  • 9
  • 4
  • This was not the question. – Albert Nov 04 '21 at 14:24
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 04 '21 at 14:40
0

I don't think there is a complete list for it. It depends on how you implement a tensor-processing function.

If you look at the tutorial on writing C++ and CUDA extensions, you would see that a typical pytorch CUDA op looks like:

  • C++ interface(s) with torch::Tensor arguments. This class provides APIs to access/manipulate tensor data.
  • CUDA kernel(s) with float* arguments. These pointers directly points to the memory storing tensor data.

Evidently, dealing with data in tensors with pointers can be a lot more efficient than dealing with the APIs of the tensor class. But it is best to deal with pointers with contiguous memory layout (or at least a regular layout).

I believe in principle it is possible to manipulate data with pointers even without contiguous data, if given enough information of the memory layout. But you would have to take all kinds of layouts into consideration and the code can be a lot more tedious.

Facebook might have certain tricks to make some built-in ops work on noncontiguous data (I don't really know enough about this), but most custom extension modules require the inputs to be contiguous.

ihdv
  • 1,927
  • 2
  • 13
  • 29
  • Yes sure, for all custom user implementation, you could never know (although if you pass the right strides to the kernels and don't access the index directly, it still would be fine, but the tutorial does not do that). But I wonder for the the builtin functions. E.g. `Conv1d` and many others. So is this just by trial and error? – Albert Nov 05 '21 at 08:34