Note: This question ran a little longer than I had planned. So, in summary:
- This a 'please help me debug this code' question.
- The first part of my question is why checking
if (ptr)
before deallocation doesn't stop a segfault - The second part is how do I actually stop the segfault
I first present the stack trace from the segfault. Then I show the various methods and type definitions as I traced them back. Hopefully, this question and my thought process will be easy to follow (and answer). Now, for the actual question .....
I am currently running unittests
on deeplab-public-version2
, which is a deep learning system built on top of Caffe
. When I run one of the unittests
, I get the following segfault:
$ ./.build_release/test/test_layer_factory.testbin
Cuda number of devices: 1
Current device id: 0
Current device name: GeForce GTX 1080 with Max-Q Design
[==========] Running 4 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LayerFactoryTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN ] LayerFactoryTest/0.TestCreateLayer
*** Aborted at 1580356126 (unix time) try "date -d @1580356126" if you are using GNU date ***
PC: @ 0x7f74c84d298d cfree
*** SIGSEGV (@0xe28) received by PID 17549 (TID 0x7f74cacda680) from PID 3624; stack trace: ***
@ 0x7f74c883e890 (unknown)
@ 0x7f74c84d298d cfree
@ 0x7f74c8e00ff1 deallocate()
@ 0x7f74c8f3367c caffe::DenseCRFLayer<>::DeAllocateAllData()
@ 0x7f74c8f36738 caffe::DenseCRFLayer<>::~DenseCRFLayer()
@ 0x7f74c8f36c09 caffe::DenseCRFLayer<>::~DenseCRFLayer()
@ 0x55f92259abc7 caffe::LayerFactoryTest_TestCreateLayer_Test<>::TestBody()
@ 0x55f9225b892a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x55f9225b10ca testing::Test::Run()
@ 0x55f9225b11ac testing::TestInfo::Run()
@ 0x55f9225b12e5 testing::TestCase::Run()
@ 0x55f9225b17a0 testing::internal::UnitTestImpl::RunAllTests()
@ 0x55f9225b18e7 testing::UnitTest::Run()
@ 0x55f9225967c1 main
@ 0x7f74c845cb97 __libc_start_main
@ 0x55f922596bba _start
Segmentation fault (core dumped)
I also get this fault when running the complete suite of unittests.
Looking at the stack trace, I believe the issue comes when deallocating a DenseCRFLayer
object. The deallocate()
method is as follows:
void deallocate(float*& ptr) {
if (ptr)
#ifdef SSE_DENSE_CRF
_mm_free( ptr );
#else
delete[] ptr;
#endif
ptr = NULL;
}
So, I would have thought that the if (ptr)
statement would ensure that the code would only attempt to deallocate an existing pointer. How could this not be the case?
If it's of any help, the DeAllocateAllData
method looks like this:
template <typename Dtype>
void DenseCRFLayer<Dtype>::DeAllocateAllData() {
deallocate(unary_);
deallocate(current_);
deallocate(next_);
deallocate(tmp_);
}
where, these 4 variable are created as follows:
template <typename Dtype>
void DenseCRFLayer<Dtype>::AllocateAllData() {
unary_ = allocate(unary_element_);
current_ = allocate(unary_element_);
next_ = allocate(unary_element_);
tmp_ = allocate(unary_element_);
}
which are allocated as follows:
float* allocate(size_t N) {
float * r = NULL;
if (N>0) {
#ifdef SSE_DENSE_CRF
r = (float*)_mm_malloc( N*sizeof(float)+16, 16 );
#else
r = new float[N];
#endif
}
memset( r, 0, sizeof(float)*N);
return r;
}
Would it make sense to add an extra check in this method to see if each of the pointers are eligible for deallocation?
The only other puzzling thing here, which might be a contributing factor is that the object destructor seems to be called twice (!?). So, if it's of any relevance here are the destructor method
template <typename Dtype>
DenseCRFLayer<Dtype>::~DenseCRFLayer() {
ClearPairwiseFunctions();
DeAllocateAllData();
}
template <typename Dtype>
void DenseCRFLayer<Dtype>::ClearPairwiseFunctions() {
for (size_t i = 0; i < pairwise_.size(); ++i) {
delete pairwise_[i];
}
pairwise_.clear();
}
where pairwise has the following type:
std::vector<PairwisePotential*> pairwise_;
and PairwisePotential is the following virtual class:
class PairwisePotential {
public:
virtual ~PairwisePotential();
virtual void apply(float * out_values, const float * in_values, float * tmp, int value_size) const = 0;
};
Finally, the unit test method is this:
namespace caffe {
template <typename TypeParam>
class LayerFactoryTest : public MultiDeviceTest<TypeParam> {};
TYPED_TEST_CASE(LayerFactoryTest, TestDtypesAndDevices);
TYPED_TEST(LayerFactoryTest, TestCreateLayer) {
typedef typename TypeParam::Dtype Dtype;
typename LayerRegistry<Dtype>::CreatorRegistry& registry =
LayerRegistry<Dtype>::Registry();
shared_ptr<Layer<Dtype> > layer;
for (typename LayerRegistry<Dtype>::CreatorRegistry::iterator iter =
registry.begin(); iter != registry.end(); ++iter) {
// Special case: PythonLayer is checked by pytest
if (iter->first == "Python") { continue; }
LayerParameter layer_param;
// Data layers expect a DB
if (iter->first == "Data") {
#ifdef USE_LEVELDB
string tmp;
MakeTempDir(&tmp);
boost::scoped_ptr<db::DB> db(db::GetDB(DataParameter_DB_LEVELDB));
db->Open(tmp, db::NEW);
db->Close();
layer_param.mutable_data_param()->set_source(tmp);
#else
continue;
#endif // USE_LEVELDB
}
layer_param.set_type(iter->first);
layer = LayerRegistry<Dtype>::CreateLayer(layer_param);
EXPECT_EQ(iter->first, layer->type());
}
}
} // namespace caffe
Note: I don't quite understand how this object is constructed: When I look in the hpp
file, I see this:
/**
* @brief The DenseCRF layer performs mean-field inference under a
* fully-connected CRF model with Gaussian potentials.
*
*/
template <typename Dtype>
class DenseCRFLayer : public Layer<Dtype> {
public:
explicit DenseCRFLayer(const LayerParameter& param)
: Layer<Dtype>(param) {}
virtual ~DenseCRFLayer();
So, I do not think a default constructor is being used. Yet, when I do the following grep
in the parent directory, I see this output, which does not show evidence of a non-default constructor:
$ grep -r 'DenseCRFLayer(' .
./include/caffe/layers/densecrf_layer.hpp: explicit DenseCRFLayer(const LayerParameter& param)
./include/caffe/layers/densecrf_layer.hpp: virtual ~DenseCRFLayer();
./src/caffe/layers/densecrf_layer.cpp:DenseCRFLayer<Dtype>::~DenseCRFLayer() {
$
I see where the constructor and destructor are declared, but I only see the destructor defined....