I'm not aware that Numpy has any explicit calls to align memory at this time. The only way I can think of doing this, short of Cython as suggested by @Saulio Castro, would be through judicious allocation of memory, with "padding", using the numpy allocation or PyOpenCL APIs.
You would need to create a buffer "padded" to align on multiples of 64K bytes. You would also need to "pad" the individual data structure elements you were allocating in the array so they too, in turn, were aligned to 4k byte boundaries. This would of course depend on what your elements look like, whether they were built in numpy data types, or structures created using the numpy dtype. The API for dtype has an "align" keyword but I would be wary of that, based on the discussion at this link.
An old school trick to align structure is to start with the largest elements, work your way down, then "pad" with enough uint8's so one or N structs fill out the alignment boundary.
Hope that's not too vague...