Yes, it makes a difference.
(256,1) creates a (1D) block of 256 threads in the X-dimension, all of which have a y-index of 0.
(128,2) creates a (2D) block of 128x2 threads, ie. 128 in the x-dimension and 2 in the y-dimension. These threads will have an x-index ranging from 0 to 127 and a y-index ranging from 0 to 1
The structure of your kernel code must comprehend the thread indexing/numbering.
For example if your kernel code starts with something like:
int idx=threadIdx.x+blockDim.x*blockIdx.x;
and doesn't create any other index variables, it's probably assuming a 1D threadblock and 1D grid.
If, on the other hand, your kernel code starts with something like:
int idx = threadIdx.x+blockDim.x*blockIdx.x;
int idy = threadIdx.y+blockDim.y*blockIdx.y;
It's probably expecting a 2D grid and 2D threadblocks.
Generally speaking, the two approaches are not interchangeable, meaning you cannot launch a kernel that expects a 1D grid with a 2D grid and expect everything to work normally, and vice-versa.