1

I have 2 twin servers, with same hardware (Infiniband and Nvidia Tesla) and same OS (CentOS6.6, kernel and drivers).

On host1 everything is working fine as usual, while on host2 I cannot run anymore this service, because I get this error:

[root@vega2 nvidia_peer_memory-1.0-0]# service nv_peer_mem start
starting... FATAL: Error inserting nv_peer_mem (/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko): Invalid module format
Failed to load nv_peer_mem

and dmesg says:

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

Note that host2 has been working fine for 2 months, until a rebooted it after summer holydays. :-( What can be the cause of this error ? The main software component didn't change (kernel, Nvidia drivers, Mellanox drivers) and hardware is ok. I tried also to repeat the installation procedure, but I get stuck at module loading point:

[root@vega2 nvidia_peer_memory-1.0-0]# rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpm
Preparing...             ########################################### [100%]
1:nvidia_peer_memory     ########################################### [100%]
FATAL: Error inserting nv_peer_mem (/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko): Invalid module format

I found this post about two kernel modules exporting the same symbols, but why on host2 this second module is disturbing nv_peer_mem, while on host1 it does not ? Here is the output of nm commands, exactly the same for both hosts.

[root@vega2 nvidia_peer_memory-1.0-0]# nm /lib/modules/2.6.32-504.el6.x86_64/kernel/drivers/video/nvidia.ko |grep nvidia_p2p_free_    page_table
0000000088765bb5 A __crc_nvidia_p2p_free_page_table
0000000000000028 r __kcrctab_nvidia_p2p_free_page_table
000000000000007e r __kstrtab_nvidia_p2p_free_page_table
0000000000000050 r __ksymtab_nvidia_p2p_free_page_table
00000000004bcb10 T nvidia_p2p_free_page_table

[root@vega2 nvidia_peer_memory-1.0-0]# nm /lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko |grep nvidia_p2p_free_page_table    
            U nvidia_p2p_free_page_table

Thanks in advance for any help. Ste.

Community
  • 1
  • 1
Stefano.C
  • 11
  • 3

0 Answers0