1

I have a multi-file (3 .cpp files and 2 .h files) C++ code that uses new, new[], delete, and delete[] operators to allocate large chunks of memory and performs some floating point operations (requires division by sqrt()). The program compiles and runs well with regular g++ and produces the required output. I now want to convert it to RISC-V instructions.

  1. I first used the GNU RISC-V compiler toolchain to compile the program with following command:

    riscv32-unknown-elf-g++ -w -march=rv32imafc -mabi=ilp32f -DPREALLOCATE=1 -mcmodel=medany -fno-common -static -Iinclude/ f1.cpp f2.cpp f3.cpp -o executable

    here executable is the compiled binary file.

  2. Since no errors are thrown at compile time, I proceeded to run the whisper simulator as follows:

    whisper --isa imafc --target executable --logfile mylog --profileinst prfl

    where mylog and prfl are output file paths.

  3. The log file generated exceeds 6 GB in size and it doesn't terminate, so I assume that it is stuck in some infinite loop.

  4. I tried adding the _start() function and tohost variable which didn't work either (same problem as 3 above). [based on suggestion here]

  5. I generated an object dump for executable and captured the (hex) addresses of starting address of main(), its ending address and address of tohost and passed them as parameters to --startpc, --endpc and --tohost respectively, in which case, the program was aborted after 64 illegal operations and ~300 MB of log file.

I think I am making some mistake. Kindly help me fix this issue or share the correct steps to generate the RISC-V assembly instructions for my C++ program.

anurag
  • 1,715
  • 1
  • 8
  • 28
  • 1
    [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Kostas Dec 01 '20 at 06:09
  • Not sure about minimal (I haven't worked with RISC-V tools before), the problem could be very spedific to my code. Assuming you have GNU RISCV compiler toolchain and Whisper simulator installed, this is my code: https://github.com/chipsalliance/SweRV-ISS/files/5620613/cppcode.zip – anurag Dec 01 '20 at 06:34
  • One trick with the MCVE is to repeatedly remove parts of original code until you find the minimal rest which reproduces the error. (Maybe, you even find your issue by yourself this way.) Concerning _runs well with regular g++_: This doesn't mean anything. If you have e.g. out-of-bound accesses you might be lucky (or in this case unlucky) that the out-of-bound access is to unused memory and hence unnoticed. However, it's still U.B. and might have a different effect on another platform. (Bell ringing?) ;-) You also could use tools like e.g. valgrind which might uncover U.B. in your code. – Scheff's Cat Dec 01 '20 at 07:26
  • You might want to enable optimization (`-O3`) so your program runs fewer total RISC-V instructions, and will simulate faster. Especially if you're logging every instruction to a file. How many instructions does it run when it works successfully (on an x86-64 system, I assume)? A typical 4GHz x86 CPU can run anywhere from 0.1 to 16 billion of instructions in 1 second, depending on the workload. I don't know what Whisper logs, but if it's one line per instruction, 6GB of log is only 600M instructions if we assume an average length of 10 bytes. – Peter Cordes Dec 01 '20 at 09:19
  • @Scheff I will be more careful in verifying my code's output in future. Indeed the o-o-b-a error somehow did not show up in the x86 runs; I should have used valgrind as a precaution. Thanks! – anurag Dec 02 '20 at 20:22
  • @PeterCordes I did not use the `-O3` flag because I needed the raw instruction set generated without any optimization. The final successful logfile (of assembly instructions) was of size 725 MB. – anurag Dec 02 '20 at 20:22
  • Ok, IDK why it would be helpful to have a boatload of redundant store/reloads ([Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394)), but I guess you have your reasons. Perhaps `-Og` (minimal optimization) would be useful. – Peter Cordes Dec 02 '20 at 20:26
  • @PeterCordes, well the reason is, I need the worst case scenario for instructions generated for my functionality. If I let the compiler do the optimization, I will miss out on some optimizations I wish to perform at the lower level! – anurag Dec 02 '20 at 20:33
  • Unoptimized code is more like "anti-optimized". Comparing against that is hardly a fair comparison, it's trivially easy to beat. (And doing more work in one C statement, with fewer named temporary variables, makes `-O0` faster, but not any realistic optimization level.) It also has different bottlenecks, usually store-forwarding latency. If you understand all this and are still want to profile that garbage code, then go ahead. – Peter Cordes Dec 02 '20 at 21:02

1 Answers1

0

Thanks Scheff, Peter Cordes for providing some accurate pointers. It indeed turned out that I had made out-of-bounds-access error in my code as detected by Valgrind; though I have to admit I bothered the developer of Whisper Simulator and he was kind enough to help.

The steps for resolution are as follows:

  1. Remove bug at: symm_normalize.h:112 - D[j] should be D[i], and symm_normalize.h:161 - for loop must evaluate till rows and not cols. Use of _start() function and tohost variable as explained on whisper github repo page is mandatory.
  2. I had used a single precision floating point architecture during compile (-march=rv32imafc) and also during simulation (--isa imafc) and erroneously used sqrt() function instead of sqrtf() at symm_normalize.h:131.
  3. Install GNU RISC-V compiler tool chain.
  4. Install Whisper simulator.
  5. Compile the code for RISC-V (using the GNU toolchain; command same as in question).
  6. Use the following command to perform the simulation: whisper --newlib --isa imafc --target executable --logfile mylog --profileinst prfl --verbose

This resolution was provided by the developer of Whisper (massive respect!) via this github repo issue.

anurag
  • 1,715
  • 1
  • 8
  • 28