I modified llvm (roc-1.6.x) a bit to generate a code that can run on AMDGPU pro dirver. It can run but the performance is over 10% slower than AMDGPU's online compiler, for the same opencl code. I wonder if there is some flags I can set to tune up llvm. If you can give me some examples it will be great.
Asked
Active
Viewed 187 times
3
-
Could you please show your modification? – Michael Lukin Sep 07 '18 at 13:27
-
1There is some open source project to follow: https://github.com/zawawawa/GCNminC – user1200759 Sep 07 '18 at 16:29
-
1I changed some local id and local size to constant and now llvm code is the same fast as amdgpu – user1200759 Oct 26 '18 at 17:32