1

I-m trying to run my python program it seems that it should run smoothly however I encounter an error that I haven't seen before it says:

free(): invalid pointer
Aborted (core dumped)

However I'm not sure how to try and fix error since it doesn't give me too much information about the problem itself.

At first I thought it should be a problem with the sizes of the tensor in my network however they are completely fine. I've google the problem a little and found that I can see that is a problem with allocating memory where I shouldn't, but I don't know how to fix this problem

My code is divided in two different files, and I use two libraries to be able to use Sinkhorn loss function and make sample randomly a mesh.

import argparse
import point_cloud_utils as pcu
import time

import numpy as np
import torch
import torch.nn as nn
from fml.nn import SinkhornLoss

import common
def main():
    # x is a tensor of shape [n, 3] containing the positions of the vertices that
    x = torch._C.from_numpy(common.loadpointcloud("sphere.txt"))
    # t is a tensor of shape [n, 3] containing a set of nicely distributed samples in the unit cube
    v, f = common.unit_cube()
    t = torch._C.sample_mesh_lloyd(pcu.lloyd(v,f,x.shape[0]).astype(np.float32)) # sample randomly a point cloud (cube for now?)

    # The model is a simple fully connected network mapping a 3D parameter point to 3D
    phi = common.MLP(in_dim=3, out_dim=3)

    # Eps is 1/lambda and max_iters is the maximum number of Sinkhorn iterations to do
    emd_loss_fun = SinkhornLoss(eps=1e-3, max_iters=20,
                                stop_thresh=1e-3, return_transport_matrix=True)

    mse_loss_fun = torch.nn.MSELoss()

    # Adam optimizer at first
    optimizer = torch.optim.Adam(phi.parameters(), lr= 10e-3)

    fit_start_time = time.time()

    for epoch in range(100):
        optimizer.zero_grad()

        # Do the forward pass of the neural net, evaluating the function at the parametric points
        y = phi(t)

        # Compute the Sinkhorn divergence between the reconstruction*(using the francis library) and the target
        # NOTE: The Sinkhorn function expects a batch of b point sets (i.e. tensors of shape [b, n, 3])
        # since we only have 1, we unsqueeze so x and y have dimension [1, n, 3]
        with torch.no_grad():
            _, P = emd_loss_fun(phi(t).unsqueeze(0), x.unsqueeze(0))

        # Project the transport matrix onto the space of permutation matrices and compute the L-2 loss
        # between the permuted points
        loss = mse_loss_fun(y[P.squeeze().max(0)[1], :], x)
        # loss = mse_loss_fun(P.squeeze() @ y,  x)  # Use the transport matrix directly

        # Take an optimizer step
        loss.backward()
        optimizer.step()

        print("Epoch %d, loss = %f" % (epoch, loss.item()))

    fit_end_time = time.time()

    print("Total time = %f" % (fit_end_time - fit_start_time))
    # Plot the ground truth, reconstructed points, and a mesh representing the fitted function, phi
    common.visualitation(x,t,phi)



if __name__ == "__main__":
    main()

The error message is: free(): invalid pointer Aborted (core dumped)

That again doesn't help me that much. I'll appreciate it a lot if someone has any idea what is happening or if you know more about this error.

  • Which line produces the error? – ndrwnaguib May 28 '19 at 16:33
  • Sounds like a bug in pytorch. I'd recommend creating a new issue for this on github where it will get the attention of the devs. – Greg May 28 '19 at 16:54
  • Can you provide a better repro example? (i.e., something we can actually run) and tell us what version of pytorch you are using and the full error message you are getting. Better yet, post all of that info in an issue on https://github.com/pytorch/pytorch – Brennan Vincent May 28 '19 at 17:05

2 Answers2

2

Edit: The cause is actually known. The recommended solution is to build both packages from source.


There is a known issue with importing both open3d and PyTorch. The cause is unknown. https://github.com/pytorch/pytorch/issues/19739

A few possible workarounds exist:

(1) Some people have found that changing the order in which you import the two packages can resolve the issue, though in my personal testing both ways crash.

(2) Other people have found compiling both packages from source to help.

(3) Still others have found that moving open3d and PyTorch to be called from separate scripts resolves the issue.

Brennan Vincent
  • 10,736
  • 9
  • 32
  • 54
1

Note for future readers: This bug was filed as issue #21018.

This is not a problem in your Python code. It is a bug in PyTorch (probably) or in Python itself (unlikely, but possible).

free(3) is a C function that releases dynamically allocated memory when it is no longer needed. You cannot (easily) call it from Python, because memory management is a low-level implementation detail normally handled by the Python interpreter. However, you are also using PyTorch, which is written in C++ and C, and does have the ability to directly allocate and free memory.

In this case, some C code has tried to release a block of memory, but the block of memory it tried to release was not dynamically allocated in the first place, which is an error. You should report this behavior to the PyTorch developers. Include as much detail as possible, including the shortest code you can find that reproduces the problem, and the complete output of that program.

Kevin
  • 28,963
  • 9
  • 62
  • 81
  • Yeah thanks I just filled it and it seems it is the importing of open3d and pytorch in the same script, I created a new script that only handles the open3d calls and it works fine. However this is only a work around, I'll see if the devs has anything to say. – Adrián Briceño Aguilar May 29 '19 at 12:05