The point of the question
I have a minimal program that segfaults during sycl runtime compilation. For the sake of detail i have precise reproduction details below. However, the point of this question is to understand how to debug this. It took me a long time to make the minimal example. I suspect if i could get the runtimes for dpcpp i could probably have cut this right down. When the runtime compiler fails, it should throw an exception. I want to know what steps i should take to discover why it is instead segfaulting, and if it is a compiler bug or a bug in my code.
Reproduction details below
starting with the code:
#include <CL/sycl/queue.hpp>
#include <CL/sycl/device.hpp>
#include <CL/sycl/context.hpp>
#include <CL/sycl.hpp>
#include <iostream>
namespace
{
auto is_sign_same(sycl::short3 idx1, sycl::short3 idx2)
{
return (idx1 < 0) == (idx2 < 0);
}
} // namespace
int main()
{
sycl::device device = sycl::device{sycl::gpu_selector{}};
std::cout
<< "\n\nRunning occupancy grid profile. The profile will have the following "
"properties:\n\n Device:\t"
<< device.get_info<sycl::info::device::name>() << "\n\n";
sycl::context context{device};
sycl::property_list properties{sycl::property::queue::enable_profiling()};
sycl::queue queue{device, properties};
auto event = queue.submit(
[](sycl::handler& cgh)
{
// 1. This must be captured or it does not crash. If i put this in the
// kernel, then it does not fail.
sycl::id<3> robot_index{0, 0, 0};
sycl::stream out(1024, 256, cgh);
cgh.parallel_for(
sycl::range<3>{4, 4, 4},
[out, robot_index](sycl::id<3> id)
{
sycl::short3 new_signed_idx{short(0)};
// 2. I cannot remove the subtract between the 2 sycl::short3 here.
// It will not fail.
sycl::short3 old_signed_idx =
sycl::short3{
(short)id.get(0), (short)id.get(1), (short)id.get(2)} -
sycl::short3{
(short)robot_index.get(0),
(short)robot_index.get(1),
(short)robot_index.get(2)};
// 3. I cannot replace this function call with the operation that
// the function performs inline here. It does not fail.
auto s_same = is_sign_same(new_signed_idx, old_signed_idx);
out << s_same;
}
);
}
);
return 0;
}
When compiled using:
/opt/intel/oneapi/compiler/2022.1.0/linux/bin/dpcpp -fclang-abi-compat=7 -fsycl --gcc-toolchain=/usr -sycl-std=2020 -fp-model=precise -Wall -Werror -fsycl -O2 -g -DNDEBUG -std=gnu++17 sgfaulting_file.cpp
will fail at runtime. The failure is a segfault. It is caused by something do do with building the kernel. If we run the output in GDB we get the following stack trace when it dies:
(gdb) where
#0 0x00007f49e3683b8c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#1 0x00007f49e36b440c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#2 0x00007f49e36b0dda in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#3 0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#4 0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#5 0x00007f49e36b0bed in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#6 0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#7 0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#8 0x00007f49e36bf027 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#9 0x00007f49e36bf908 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#10 0x00007f49e35ab7bc in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#11 0x00007f49e35abfba in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#12 0x00007f49e35ae90d in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#13 0x00007f49e36ec3d4 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#14 0x00007f49e35b21fb in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#15 0x00007f49e36ced9a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#16 0x00007f49f487f1bb in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#17 0x00007f49f43ef178 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#18 0x00007f49f4397b33 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#19 0x00007f49f9327aa4 in cl::sycl::detail::ProgramManager::build(std::unique_ptr<_pi_program, _pi_result (*)(_pi_program*)>, std::shared_ptr<cl::sycl::detail::context_impl>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _pi_device* const&, std::map<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*>, _pi_program*, std::less<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> >, std::allocator<std::pair<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> const, _pi_program*> > >&, unsigned int) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#20 0x00007f49f9321336 in cl::sycl::detail::ProgramManager::getBuiltPIProgram(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*, bool) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#21 0x00007f49f932243c in cl::sycl::detail::ProgramManager::getOrCreateKernel(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#22 0x00007f49f93630f1 in cl::sycl::detail::enqueueImpKernel(std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::NDRDescT&, std::vector<cl::sycl::detail::ArgDesc, std::allocator<cl::sycl::detail::ArgDesc> >&, std::shared_ptr<cl::sycl::detail::kernel_bundle_impl> const&, std::shared_ptr<cl::sycl::detail::kernel_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long const&, std::vector<_pi_event*, std::allocator<_pi_event*> >&, _pi_event**, std::function<void* (cl::sycl::detail::AccessorImplHost*)> const&) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#23 0x00007f49f9369f3b in cl::sycl::detail::ExecCGCommand::enqueueImp() ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#24 0x00007f49f93566c5 in cl::sycl::detail::Command::enqueue(cl::sycl::detail::EnqueueResultT&, cl::sycl::detail::BlockingT, std::vector<cl::sycl::detail::Command*, std::allocator<cl::sycl::detail::Command*> >&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#25 0x00007f49f9373b7b in cl::sycl::detail::Scheduler::addCG(std::unique_ptr<cl::sycl::detail::CG, std::default_delete<cl::sycl::detail::CG> >, std::shared_ptr<cl::sycl::detail::queue_impl>) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#26 0x00007f49f93aef30 in cl::sycl::handler::finalize() ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#27 0x00007f49f93dc3ea in cl::sycl::detail::queue_impl::finalizeHandler(cl::sycl::handler&, cl::sycl::detail::CG::CGTYPE const&, cl::sycl::event&) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#28 0x00007f49f93dc13b in cl::sycl::detail::queue_impl::submit_impl(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std::function<void (bool, bool, cl::sycl::event&)> const*) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#29 0x00007f49f93db744 in cl::sycl::detail::queue_impl::submit(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std:--Type <RET> for more, q to quit, c to continue without paging--
:function<void (bool, bool, cl::sycl::event&)> const*) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#30 0x00007f49f93db715 in cl::sycl::queue::submit_impl(std::function<void (cl::sycl::handler&)>, cl::sycl::detail::code_location const&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#31 0x00000000004026d8 in cl::sycl::queue::submit<main::{lambda(cl::sycl::handler&)#1}>(main::{lambda(cl::sycl::handler&)#1}, cl::sycl::detail::code_location const&) (this=0x7ffc5da1b200, CodeLoc=..., CGF=...)
at /opt/intel/oneapi/compiler/2022.1.0/linux/bin-llvm/../include/sycl/CL/sycl/queue.hpp:275
#32 main () at occupancy_grid_point_cloud_creation.cpp:31
The important part being stack position #19:
cl::sycl::detail::ProgramManager::build
The runtime compilation is occurring on the device (from sycl-ls
):
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics [0x9bc4] 3.0 [22.28.23726.1]
if we run the same program but use a host or cpu selector, we do not fail to build and can run successfully. It also seems that if we change minimal details about the program, it also no longer segfaults. These small changes are detailed in the comments in the program.