In CUDA profiler, there are two metrics called dram_read_transactions and gld_transactions. The cuda profiler user guide says "gld_transactions" means the number of global memory load transactions, while "dram_read_transactions" means device memory read transactions. I cannot tell the difference between these descriptions because reading data means loading data and global memory is dram. But the profiling results of these two metrics are different. I tested with one kernel. For the same kernel with different threads settings, the gld_transactions is always the same value 33554432. And this value is stable. But for dram_read_transactions, two different threads settings lead to different values, they are roughly 4486636 and 4197096. For the word "roughly" I mean these values are not stable because they slightly change from one execution to another. We can also see the dram_transactions is much less than gld_transactions. So my questions can be summarized here:
- What is the real difference between gld_transactions and dram_read_transactions?
- Why the dram_read_transactions is much smaller than gld_transactions?
- For different threads settings, why the gld_transactions value is stable while dram_read_transactions is unstable?
I think once we know the answer for question (1), then questions (2) and (3) can be easily explained. So can anyone explain this? Thanks in advance.