0

I want to log the performance of my system on very intensive network loads (Sending over 100gbps). At some point after running my network at continous >100 Gbps loads (using my own C++ implementation via winsock2) my transfer rates start dropping, more UDP packages start becomming lost, and eventually my Mellanox drivers are no longer detectable in task manager. I suspect that temperature is the culprit here, and as such I would like to collect the data of my mellanox's temperature during the execution of my application. While I am able to get the temperature of the mellanox card using the firmware tools via the command line, this is not something I can use (to my knowledge) from C++ to add it directly to my log. Does anyone know if there's a mellanox API that allows you to inquire this information from C++? I would rather not have to do system() calls and pipe data over to random files if possible.

Additionally, I would also like to see if temperature is being an issue in my CPUs, and I would like to also log this data as well. Similar to the mellanox card, an API that would allow me to find this info in C++ would be appreciated. Note that I am using windows 11 as my OS and I am not able to use Linux for this project (I'm sure that a trivial solution for it already exists in Linux).

For additional information here are the two setups:

PC 1: AMD, 128GBs DDR4 memory, Mellanox connectx-5 adapter, CPU: threadripper 3960x, Brand: Self built

PC 2: Intel, 16 GBs DDR4 memory, Mellanox connectx-5 adapter, CPU: Core i-7 11th gen, Brand: Aurora R12 Alienware

Currently I use the system() call to call the mellanox firmware tools and get the temperature, however this prints out the resulting temp to the console and it is not something that I can easily capture as most threads I've seen talk about Creating child pipes and capturing the stdouts of those. I would like to avoid this if possible.

Botje
  • 26,269
  • 3
  • 31
  • 41
Valdez
  • 46
  • 3
  • 2
    "it is not something that I can easily capture" is not true. If you can call it via `system()`, you can [call it via `pipe()`](https://stackoverflow.com/questions/478898/how-do-i-execute-a-command-and-get-the-output-of-the-command-within-c-using-po). `system()` doesn't redirect the output at all, but `pipe()` feeds it wherever you want it. – tadman Aug 31 '23 at 19:27
  • I would like to avoid that based on the many negative comments I've seen regarding the usage of system(). – Valdez Aug 31 '23 at 19:31
  • 1
    Then use the second option tadman listed. You'll see that it doesn't use `system`. You will likely have to make a few small Windows-targeting changes, `_popen` in place of `popen`, for example, but otherwise it works as advertised. – user4581301 Aug 31 '23 at 19:35
  • Ah. You want to avoid the pipes as well. OK, NOW you have a problem with that solution. – user4581301 Aug 31 '23 at 19:37
  • If you're calling this infrequently, as in less than once a minute, the overhead should be incalculable on most systems, especially those with enough CPU to *literally melt a 100Gbit card*. – tadman Aug 31 '23 at 19:38
  • 1
    I misread the response, yes, pipe() could be used. And I think that helps solve the problem, thanks. – Valdez Aug 31 '23 at 19:38
  • Another thing you could do is stick some heat-sinks on your card if the factory ones are insufficient. There are all kinds of heat-sink add-ons for things like NVMe cards, memory, etc. and you can probably find some that are a bit beefier, and/or may even include a fan. Of course if you're collecting data on temperatures, you can measure any impact any solutions have. – tadman Aug 31 '23 at 19:39
  • Yes, I am definitely aiming to get better equipment (specifically fans), but I have to show that the thermals are a problem before my managers approve the budget. Unfortunately we have a bit of a rigorous standard of proof necessary to acquire new equipment. – Valdez Aug 31 '23 at 19:42
  • Annoying, but it's better than the company spending frivolously and you needing a new job in a few months. Don't forget that you might be able to meet the burden of proof with a simple thermometer or third-party data logger that the company already has. A thermistor and an hobbyist CPU board with a 12 bit ADC can be a wonderful thing. – user4581301 Aug 31 '23 at 21:07

0 Answers0