Highest Voted 'amd-gcn' Questions

4

votes

1 answer

Performance drop in matrix multiplication for certain sizes on AMD Polaris

I have an OpenCL code that multiplies 2 matrices (GEMM) with M=4096, N=4096 and K=16. (i.e. matrices 4096 x 16 floats) I run it on Polaris 560, 16CU GPU. Code: https://github.com/artyom-beilis/oclblas/blob/master/gemm/gemm.cl I noticed very strange…

asked Jun 27 '21 at 08:17

Artyom

31,019
21
127
215

3

votes

0 answers

Is It Possible to Access the Entire 64KB of the Global Data Sharea (GDS) on AMD's GCN2+ GPU's?

I am trying to use the Global Data Share (GDS) on AMD RX 480 for my application either on Linux or Windows. Although the GCN3 Specification Manual states that you can access the GDS without restrictions by setting an appropriate value to the m0…

assembly gpgpu amd-gcn

asked Mar 15 '17 at 01:14

meriken2ch

409
5
15

2

votes

1 answer

SIMD-16 and SIMD-32 advantage/disadvantage?

So recently, AMD launched their new GPU architecture called rDNA in their new Navi GPU line up. After reading certain architecture deep-dive article and video, my understanding is this (feel free to correct if I am wrong): Small workloads that need…

parallel-processing gpu cpu-architecture simd amd-gcn

asked Aug 17 '19 at 08:08

CSDD

339
2
14

2

votes

0 answers

How to read and write to Global Data Share in AMD GCN?

I'm trying to use GDS in AMD GPU, but I can not make it work. My GPU is AMD RX580. I used this OpenCL kernel: __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void localVarExample(__global int *res) { int i = get_global_id(0); __local…

assembly opencl amd-gpu amd-gcn

asked Jul 08 '19 at 20:58

Michael Lukin

829
3
9
19

1

vote

0 answers

In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?

I'm relatively new to OpenCL and was wondering about this. I'd heard that it was possible to JIT on some AMD gpus via OpenCL. Now, if this were to work syntactically as it does in c++, I would just write something like: uint…

opencl jit amd-gcn

asked Jan 27 '20 at 20:17

MNagy

423
7
20

1

vote

1 answer

V_SUB_F64 in AMD's GCN and VEGA instruction set

Why there is no "V_SUB_F64" instruction in AMD's GCN and VEGA instruction set? How do they realise the double precision subtraction?

opencl gpu amd amd-gcn

asked Jun 08 '18 at 14:26

air_sky_123

11
1

1

vote

2 answers

OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous

I'm going to improve OCL kernel performance and want to clarify how memory transactions work and what memory access pattern is really better (and why). The kernel is fed with vectors of 8 integers which are defined as array: int v[8], that means,…

optimization opencl gpgpu amd-gcn

asked Oct 06 '17 at 15:43

qpdb

41
4

1

vote

0 answers

Do optimized kernels running on AMD GCN OpenCL only work with ~1024 bytes at a time?

I'm beginning to architect my first serious OpenCL program, and I want to make sure I understand how my AMD R9 290x is set up. (GCN 2.0 Architecture). So I'll just say what I understand, and hopefully someone out there can tell me where I'm right or…

optimization opencl gpgpu amd-gcn

asked Sep 26 '17 at 22:06

Dragontamer5788

1,957
1
12
20

0

votes

1 answer

How to resolve _pickle.UnpicklingError

I was trying to download and run the 2s-AGCN code from this GitHub link: I have generated data successfully But when trying to train the model by running main.py, I am facing this error: [ Mon Feb 20 21:32:20 2023 ] Training epoch: 1 0%| …

python pickle torch amd-gcn

asked Feb 21 '23 at 03:09

Sumaiya

1

0

votes

0 answers

Instruction execution in GPGPU

I am learning GPU hardware (AMD GCN architecture). I am confused a little bit about the instruction executions. Let me take an example: for(i=0;i<64;i++) c[i] = a[i] + b[i] for the above code. Assuming the warp/wavefront has 64 threads. Now a…

gpu gpgpu amd-gcn

asked Feb 11 '23 at 03:19

MGS

9
1

0

votes

1 answer

What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?

I compute trajectories of N particles which move in their gravitation force field. I wrote the following OpenCL kernel: #define G 100.0f #define EPS 1.0f float2 f (float2 r_me, __constant float *m, __global float2 *r, size_t s, size_t n) { …

opencl amd-gcn

asked Sep 09 '19 at 14:00

shamaz.mazum

89
7

0

votes

1 answer

Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?

Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;

opencl amd-gpu amd-gcn

asked Aug 21 '18 at 20:50

user1200759

91
7

0

votes

0 answers

How to run two work groups per one compute unit on AMD GCN cards

Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? Or I need to use assembly instruction? I want to…

opencl gpu amd amd-gcn

asked Aug 18 '18 at 19:13

user1200759

91
7

0

votes

0 answers

GCM not receiving on ColorOS based devices

I have an android app which sends and receive FCM Messages(Google Firebase Messages). Its working perfectly as i can receive the messages when my app is not in forground or not using it as i have a service running on it. Unfortunately its not…

android google-cloud-messaging firebase-cloud-messaging amd-gcn

asked Nov 04 '17 at 13:04

Prakash

357
2
14

0

votes

0 answers

Avoid L1 cache pollution on GCN device

I have a kernel that writes results to a global buffer; these results are never read back into the kernel (they are processed by another kernel at a later time). So, I don't want this data sitting in the L1 cache if I can help it. Is there a way of…

caching opencl gpgpu amd-gcn

asked Sep 20 '17 at 00:51

Jacko

12,665
18
75
126

Questions tagged [amd-gcn]