2

The point of the question

I have a minimal program that segfaults during sycl runtime compilation. For the sake of detail i have precise reproduction details below. However, the point of this question is to understand how to debug this. It took me a long time to make the minimal example. I suspect if i could get the runtimes for dpcpp i could probably have cut this right down. When the runtime compiler fails, it should throw an exception. I want to know what steps i should take to discover why it is instead segfaulting, and if it is a compiler bug or a bug in my code.

Reproduction details below

starting with the code:


#include <CL/sycl/queue.hpp>
#include <CL/sycl/device.hpp>
#include <CL/sycl/context.hpp>
#include <CL/sycl.hpp>

#include <iostream>

namespace
{
auto is_sign_same(sycl::short3 idx1, sycl::short3 idx2)
{
    return (idx1 < 0) == (idx2 < 0);
}

}  // namespace

int main()
{
    sycl::device device = sycl::device{sycl::gpu_selector{}};

    std::cout
        << "\n\nRunning occupancy grid profile. The profile will have the following "
           "properties:\n\n  Device:\t"
        << device.get_info<sycl::info::device::name>() << "\n\n";

    sycl::context       context{device};
    sycl::property_list properties{sycl::property::queue::enable_profiling()};
    sycl::queue         queue{device, properties};

    auto event = queue.submit(
        [](sycl::handler& cgh)
        {
            // 1. This must be captured or it does not crash. If i put this in the
            // kernel, then it does not fail.
            sycl::id<3>  robot_index{0, 0, 0};
            sycl::stream out(1024, 256, cgh);

            cgh.parallel_for(
                sycl::range<3>{4, 4, 4},
                [out, robot_index](sycl::id<3> id)
                {
                    sycl::short3 new_signed_idx{short(0)};

                    // 2. I cannot remove the subtract between the 2 sycl::short3 here.
                    // It will not fail.
                    sycl::short3 old_signed_idx =
                        sycl::short3{
                            (short)id.get(0), (short)id.get(1), (short)id.get(2)} -
                        sycl::short3{
                            (short)robot_index.get(0),
                            (short)robot_index.get(1),
                            (short)robot_index.get(2)};

                    // 3. I cannot replace this function call with the operation that
                    // the function performs inline here. It does not fail.
                    auto s_same = is_sign_same(new_signed_idx, old_signed_idx);
                    out << s_same;
                }
            );
        }
    );

    return 0;
}

When compiled using:

/opt/intel/oneapi/compiler/2022.1.0/linux/bin/dpcpp   -fclang-abi-compat=7 -fsycl --gcc-toolchain=/usr -sycl-std=2020 -fp-model=precise -Wall -Werror -fsycl -O2 -g -DNDEBUG  -std=gnu++17 sgfaulting_file.cpp

will fail at runtime. The failure is a segfault. It is caused by something do do with building the kernel. If we run the output in GDB we get the following stack trace when it dies:

(gdb) where
#0  0x00007f49e3683b8c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#1  0x00007f49e36b440c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#2  0x00007f49e36b0dda in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#3  0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#4  0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#5  0x00007f49e36b0bed in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#6  0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#7  0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#8  0x00007f49e36bf027 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#9  0x00007f49e36bf908 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#10 0x00007f49e35ab7bc in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#11 0x00007f49e35abfba in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#12 0x00007f49e35ae90d in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#13 0x00007f49e36ec3d4 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#14 0x00007f49e35b21fb in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#15 0x00007f49e36ced9a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#16 0x00007f49f487f1bb in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#17 0x00007f49f43ef178 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#18 0x00007f49f4397b33 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#19 0x00007f49f9327aa4 in cl::sycl::detail::ProgramManager::build(std::unique_ptr<_pi_program, _pi_result (*)(_pi_program*)>, std::shared_ptr<cl::sycl::detail::context_impl>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _pi_device* const&, std::map<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*>, _pi_program*, std::less<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> >, std::allocator<std::pair<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> const, _pi_program*> > >&, unsigned int) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#20 0x00007f49f9321336 in cl::sycl::detail::ProgramManager::getBuiltPIProgram(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*, bool) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#21 0x00007f49f932243c in cl::sycl::detail::ProgramManager::getOrCreateKernel(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#22 0x00007f49f93630f1 in cl::sycl::detail::enqueueImpKernel(std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::NDRDescT&, std::vector<cl::sycl::detail::ArgDesc, std::allocator<cl::sycl::detail::ArgDesc> >&, std::shared_ptr<cl::sycl::detail::kernel_bundle_impl> const&, std::shared_ptr<cl::sycl::detail::kernel_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long const&, std::vector<_pi_event*, std::allocator<_pi_event*> >&, _pi_event**, std::function<void* (cl::sycl::detail::AccessorImplHost*)> const&) ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#23 0x00007f49f9369f3b in cl::sycl::detail::ExecCGCommand::enqueueImp() ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#24 0x00007f49f93566c5 in cl::sycl::detail::Command::enqueue(cl::sycl::detail::EnqueueResultT&, cl::sycl::detail::BlockingT, std::vector<cl::sycl::detail::Command*, std::allocator<cl::sycl::detail::Command*> >&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#25 0x00007f49f9373b7b in cl::sycl::detail::Scheduler::addCG(std::unique_ptr<cl::sycl::detail::CG, std::default_delete<cl::sycl::detail::CG> >, std::shared_ptr<cl::sycl::detail::queue_impl>) ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#26 0x00007f49f93aef30 in cl::sycl::handler::finalize() ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#27 0x00007f49f93dc3ea in cl::sycl::detail::queue_impl::finalizeHandler(cl::sycl::handler&, cl::sycl::detail::CG::CGTYPE const&, cl::sycl::event&) ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#28 0x00007f49f93dc13b in cl::sycl::detail::queue_impl::submit_impl(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std::function<void (bool, bool, cl::sycl::event&)> const*) ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#29 0x00007f49f93db744 in cl::sycl::detail::queue_impl::submit(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std:--Type <RET> for more, q to quit, c to continue without paging--
:function<void (bool, bool, cl::sycl::event&)> const*) ()
   from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#30 0x00007f49f93db715 in cl::sycl::queue::submit_impl(std::function<void (cl::sycl::handler&)>, cl::sycl::detail::code_location const&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#31 0x00000000004026d8 in cl::sycl::queue::submit<main::{lambda(cl::sycl::handler&)#1}>(main::{lambda(cl::sycl::handler&)#1}, cl::sycl::detail::code_location const&) (this=0x7ffc5da1b200, CodeLoc=..., CGF=...)
    at /opt/intel/oneapi/compiler/2022.1.0/linux/bin-llvm/../include/sycl/CL/sycl/queue.hpp:275
#32 main () at occupancy_grid_point_cloud_creation.cpp:31

The important part being stack position #19:

cl::sycl::detail::ProgramManager::build

The runtime compilation is occurring on the device (from sycl-ls):

[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics [0x9bc4] 3.0 [22.28.23726.1]

if we run the same program but use a host or cpu selector, we do not fail to build and can run successfully. It also seems that if we change minimal details about the program, it also no longer segfaults. These small changes are detailed in the comments in the program.

Fantastic Mr Fox
  • 32,495
  • 27
  • 95
  • 175
  • Do you use standard *gdb* or *gdb-oneapi*? Unless you already did it, try to run your code under *gdb-oneapi* tool and also try other debugging hints described at [Get Started with Intel® Distribution for GDB](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-debugging-dpcpp-linux/top.html), especially gdbserver-gt can attach to GPU. Something more may be also visible if you turn off compiler optimizations (-O0 instead of -O2). – Łukasz Ślusarczyk Nov 18 '22 at 13:37
  • We are working on the same issue [here](https://community.intel.com/t5/Intel-C-Compiler/DPC-segfaults-at-runtime-compiling-valid-sycl-kernel/m-p/1429740#M40474) Would update once resolved! :) – AlekhyaV - Intel Dec 07 '22 at 09:11
  • Yep, i opened that post. This question is not about the solution to that issue, it is about how to debug compiler issues for dpcpp in general. – Fantastic Mr Fox Dec 07 '22 at 23:54

0 Answers0