0

I want to parallelise my googletest cases in c++. I have read the documentation of google test sharding but unable to implement it in c++ coding environment. As I'm new to the coding field , so can anyone please by a code explain to me the documentation in the link below https://github.com/google/googletest/blob/master/googletest/docs/advanced.md

Google Sharding works on different machines or can be implemented on same using multiple threads?

rold2007
  • 1,297
  • 1
  • 12
  • 25
GHoul
  • 13
  • 2
  • looks like it just breaks up the tests into N chunks and then runs a different chunk on each machine depending on the index specified in the environment. I don't see any reason you couldn't run multiple shards on the same machine with different environment variables, assuming your code can deal with that, as well. It doesn't have anything to do with threads, but it runs a process per shard. – xaxxon May 29 '17 at 12:13
  • Agreed. It doesn't make much sense to shard on the same machine. – James Poag May 29 '17 at 12:16
  • @JamesPoag I didn't say that. If you have a lot of tests, it can save time to run them in parallel, which google test doesn't do otherwise, as far as I know. – xaxxon May 29 '17 at 12:21
  • Use https://stackoverflow.com/questions/17929414/how-to-use-setenv-to-export-a-variable-in-c to test sharding, but I'm not certain you can get the same program instance to run different env variables. Maybe if you took the shard # as a program argument and tried to set the env and just executed your program N times. – James Poag May 29 '17 at 13:02

3 Answers3

1

Sharding isn't done in code, it's done using the environment. Your machine specifies two environment variables GTEST_TOTAL_SHARDS, which is the total number of machines you are running and GTEST_SHARD_INDEX, which is unique to each machine. When GTEST starts up, it selects a subset of these tests.

If you want to simulate this, then you need to set these environment variables (which can be done in code).

I would probably try something like this (on Windows) in a .bat file:

set GTEST_TOTAL_SHARDS=10
FOR /L %%I in (1,1,10) DO cmd.exe /c "set GTEST_SHARD_INDEX=%%I && start mytest.exe"

And hope that the new cmd instance had it's own environment.

James Poag
  • 2,320
  • 1
  • 13
  • 20
  • my implementation of setting the GTEST_SHARD_INDEX in individual threads only seem to be working for only one shard not the others. What is the possible Issue and it shows global tear down for test .What is the issue? – GHoul May 30 '17 at 06:48
  • The environment variable is a global system variable. Per user. When you set it, kick off a thread and change it, there's a race between the threads to read the variable. I doubt you would get more than one or two shards using threads. You need to launch the .exe in with different environments https://superuser.com/a/424002 – James Poag May 30 '17 at 10:35
  • I've edited this answer to possibly 'push' the environment with a new instance of cmd.exe. – James Poag May 30 '17 at 10:50
  • The user space environment variables are different for different cmd , so have to make changes to the batch script accordingly .Let's hope it works – GHoul May 31 '17 at 05:55
0

Running the following in a command window worked for me (very similar to James Poag's answer, but note change of range from "1,1,10" to "0,1,9", "%%" -> "%" and "set" to "set /A"):

set GTEST_TOTAL_SHARDS=10
FOR /L %I in (0,1,9) DO cmd.exe /c "set /A GTEST_SHARD_INDEX=%I && start mytests.exe"
0

After further experimentation it is also possible to do this in C++. But it is not straightforward and I did not find a portable way of doing it. I can't post the code as it was done at work.

Essentially, from main, create new processes (where n is the number of cores available), capture the results from each shard, merge and output to the screen. To get each process running a different shard, the total number of shard and instance number is given to the child process by the controller.

This is done by retrieving and copying the current environment, and setting in the copy the two environment variables (GTEST_TOTAL_SHARDS and GTEST_SHARD_INDEX) as required. GTEST_TOTAL_SHARDS is always the same, but GTEST_SHARD_INDEX will be the instance number of the child.

Merging the results is tedious but straightforward string manipulation. I successfully managed to get a correct total at the end, adding up the results of all the separate shards.

I was using Windows, so used CreateProcessA to create the new processes, passing in the custom environment.

It turned out that creating new processes takes a significant amount of time, but my program was taking about 3 minutes to run, so there was good benefits to be had from parallel running - the time came down to about 30 seconds on my 12 core PC.

Note that if this all seems overkill, there is a python program which does what I have described here but using a python script (I think - I haven't used it). This might be more straight forward.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459