140

Given a NumPy array of int32, how do I convert it to float32 in place? So basically, I would like to do

a = a.astype(numpy.float32)

without copying the array. It is big.

The reason for doing this is that I have two algorithms for the computation of a. One of them returns an array of int32, the other returns an array of float32 (and this is inherent to the two different algorithms). All further computations assume that a is an array of float32.

Currently I do the conversion in a C function called via ctypes. Is there a way to do this in Python?

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Using `ctypes` is as much "in Python" as using `numpy`. :) – Karl Knechtel Dec 08 '10 at 16:40
  • 4
    @Karl: No, because I have to code and compile the C function myself. – Sven Marnach Dec 08 '10 at 16:42
  • Oh, I see. I think you're probably SOL on this one. – Karl Knechtel Dec 08 '10 at 16:45
  • Naive question: How can you tell a=a.astype(numpy.float32) is making a copy? Python slows to a crawl and your disk starts thrashing? – Andrew Jan 22 '11 at 20:04
  • 4
    @Andrew: There are many ways to tell if it returns a copy. One of them is to read the [documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html). – Sven Marnach Jan 22 '11 at 20:24
  • What do you think about using a = np.cast['f'](a) with `a` your `int32` array? – Stefano Messina Feb 13 '12 at 11:59
  • This does not perform an in-place conversion. It does the same as the code in the original question. (Don't bother to try to correct it -- it won't get any better than in the accepted answer.) – Sven Marnach Feb 13 '12 at 12:11
  • It's just the way the functions in this dictionary work. If I'd ask "why does `print` print its arguments?" or "why does `a + b` denote the sum a `a` and `b`?", how would you answer those questions? – Sven Marnach Feb 13 '12 at 14:08
  • 1
    In-place simply means "using the same memory as the original array". Have a look at the accepted answer -- the last part shows that the new values indeed have overwritten the same memory. – Sven Marnach Feb 13 '12 at 15:02

7 Answers7

167

Update: This function only avoids copy if it can, hence this is not the correct answer for this question. unutbu's answer is the right one.


a = a.astype(numpy.float32, copy=False)

numpy astype has a copy flag. Why shouldn't we use it ?

Vikas
  • 2,220
  • 1
  • 15
  • 12
  • 15
    Once this parameter is supported in a NumPy release, we could of course use it, but currently it's only available in the development branch. And at the time I asked this question, it didn't exist at all. – Sven Marnach May 17 '12 at 19:08
  • 3
    @SvenMarnach It is now supported, at least in my version (1.7.1). – PhilMacKay Aug 27 '13 at 20:09
  • It seems to work perfectly in python3.3 with the latest numpy version. – CHM Oct 10 '13 at 21:41
  • 1
    I find this to be around 700x slower than a = a.view((float, len(a.dtype.names))) – J.J May 08 '15 at 10:44
  • 16
    The copy flag only says that if the change can be done without a copy, it will be done without a copy. However it the type is different it will still always copy. – coderforlife Oct 11 '15 at 02:59
  • 1
    `import numpy as np; x = np.ones(int(1.9e9), dtype=np.int64); x.astype(np.float64, copy=False)` gives out of memory error on a machine with 16 Gb memory. It might still create intermediates. – hamster on wheels Jul 19 '17 at 21:47
118

You can make a view with a different dtype, and then copy in-place into the view:

import numpy as np
x = np.arange(10, dtype='int32')
y = x.view('float32')
y[:] = x

print(y)

yields

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.], dtype=float32)

To show the conversion was in-place, note that copying from x to y altered x:

print(x)

prints

array([         0, 1065353216, 1073741824, 1077936128, 1082130432,
       1084227584, 1086324736, 1088421888, 1090519040, 1091567616])
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 33
    Note for those (like me) that want conversion between dtype of different byte-size (e.g. 32 to 16 bits): This method fails because y.size <> x.size. Logical once you think about it :-( – Juh_ Jun 12 '12 at 09:17
  • Was this solution working for some older version of Numpy? When I do `np.arange(10, dtype=np.int32).view(np.float32)` on Numpy 1.8.2, I get `array([ 0.00000000e+00, 1.40129846e-45, ... [snip] ... 1.26116862e-44], dtype=float32)`. – Bas Swinckels Jun 25 '15 at 08:25
  • 3
    @BasSwinckels: That's expected. The conversion occurs when you assign `y[:] = x`. – unutbu Jun 25 '15 at 10:24
  • to clarify the point made about the itemsize (number of bits) referred to by the original answer and @Juh_ e.g.: `a = np.arange(10, dtype='float32'); b = a[::-1]; c = np.vstack((a,b)); d = c.view('float64')` This code takes 10 + 10 float32 and results in 10, rather than 20 float64 – dcanelhas Aug 16 '17 at 05:04
  • 2
    This in-place change may save on memory use, but it is slower than a simple `x.astype(float)` conversion. I wouldn't recommend it unless your script is bordering on MemoryError. – hpaulj Feb 20 '19 at 05:04
14

You can change the array type without converting like this:

a.dtype = numpy.float32

but first you have to change all the integers to something that will be interpreted as the corresponding float. A very slow way to do this would be to use python's struct module like this:

def toi(i):
    return struct.unpack('i',struct.pack('f',float(i)))[0]

...applied to each member of your array.

But perhaps a faster way would be to utilize numpy's ctypeslib tools (which I am unfamiliar with)

- edit -

Since ctypeslib doesnt seem to work, then I would proceed with the conversion with the typical numpy.astype method, but proceed in block sizes that are within your memory limits:

a[0:10000] = a[0:10000].astype('float32').view('int32')

...then change the dtype when done.

Here is a function that accomplishes the task for any compatible dtypes (only works for dtypes with same-sized items) and handles arbitrarily-shaped arrays with user-control over block size:

import numpy

def astype_inplace(a, dtype, blocksize=10000):
    oldtype = a.dtype
    newtype = numpy.dtype(dtype)
    assert oldtype.itemsize is newtype.itemsize
    for idx in xrange(0, a.size, blocksize):
        a.flat[idx:idx + blocksize] = \
            a.flat[idx:idx + blocksize].astype(newtype).view(oldtype)
    a.dtype = newtype

a = numpy.random.randint(100,size=100).reshape((10,10))
print a
astype_inplace(a, 'float32')
print a
Paul
  • 42,322
  • 15
  • 106
  • 123
  • 2
    Thanks for your answer. Honestly, I don't think this is very useful for **big** arrays -- it is way too slow. Reinterpreting the data of the array as a different type is easy -- for example by calling `a.view(numpy.float32)`. The hard part is actually converting the data. `numpy.ctypeslib` only helps with reinterpreting the data, not with actually converting it. – Sven Marnach Dec 08 '10 at 17:39
  • ok. I wasn't sure what your memory/processor limitations were. See my edit. – Paul Dec 08 '10 at 18:16
  • Thanks for the update. Doing it blockwise is a good idea -- probably the best you can get with the current NumPy interface. But in this case, I will probably stick to my current ctypes solution. – Sven Marnach Dec 08 '10 at 20:21
0

Time spent reading data

t1=time.time() ; V=np.load ('udata.npy');t2=time.time()-t1 ; print( t2 )

95.7923333644867

V.dtype

dtype('>f8')

V.shape

(3072, 1024, 4096)

**Creating new array **

t1=time.time() ; V64=np.array( V, dtype=np.double); t2=time.time()-t1 ; print( t2 )

1291.669689655304

Simple in-place numpy conversion

t1=time.time() ; V64=np.array( V, dtype=np.double); t2=time.time()-t1 ; print( t2 )

205.64322113990784

Using astype

t1=time.time() ; V = V.astype(np.double) ; t2=time.time()-t1 ; print( t2 )

400.6731758117676

Using view

t1=time.time() ; x=V.view(np.double);V[:,:,:]=x ;t2=time.time()-t1 ; print( t2 )

556.5982494354248

Note that each time I cleared the variables. Thus simply let python handle the conversion is the most efficient.

Arphy
  • 1
  • There were several reasons why I wanted to avoid the copy. The main reasons were that it uses twice the amount of memory, which often simply wouldn't fit. Another reason was that I needed the pointer to the array to remain stable, since there was some C code still holding the same pointer. Raw speed was one concern, but not the main concern. I realize that other people visitiing this question have other requirments, so for them having these benchmarks may be useful. – Sven Marnach Jun 09 '21 at 12:00
  • 1
    Mistake: "Creating new array" and "Simple in-place numpy conversion" looks same. Please correct. @Arphy – Gokul NC Mar 24 '22 at 05:03
-1
import numpy as np
arr_float = np.arange(10, dtype=np.float32)
arr_int = arr_float.view(np.float32)

use view() and parameter 'dtype' to change the array in place.

蒋志强
  • 11
  • 1
  • The goal of the question was to actually _convert_ the data in place. After correcting the type in the last line to `int`, this answer would only reinterpret the existing data as a different type, which isn't what I was asking for. – Sven Marnach Aug 05 '19 at 19:49
  • what do you mean? dtype is just the appearance of data in memory, it really workes.However in np.astype, parameter 'casting' can control convert method default 'unsafe'. – 蒋志强 Aug 05 '19 at 21:13
  • Yeah, I agree with the first accepted answer. However arr_.astype(new_dtype, copy=False) still returns a newly allocated array. How to satisfied the `dtype`, `order`, and `subok` requirements to return a copy of array? I don't solve it. – 蒋志强 Aug 05 '19 at 21:20
-5

Use this:

In [105]: a
Out[105]: 
array([[15, 30, 88, 31, 33],
       [53, 38, 54, 47, 56],
       [67,  2, 74, 10, 16],
       [86, 33, 15, 51, 32],
       [32, 47, 76, 15, 81]], dtype=int32)

In [106]: float32(a)
Out[106]: 
array([[ 15.,  30.,  88.,  31.,  33.],
       [ 53.,  38.,  54.,  47.,  56.],
       [ 67.,   2.,  74.,  10.,  16.],
       [ 86.,  33.,  15.,  51.,  32.],
       [ 32.,  47.,  76.,  15.,  81.]], dtype=float32)
-5

a = np.subtract(a, 0., dtype=np.float32)

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
MIO
  • 1
  • 1
    While this code snippet may be the solution, [including an explanation](//meta.stackexchange.com/questions/114762/explaining-entirely-‌​code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Sebastialonso Nov 17 '17 at 19:35
  • Why should this be an **in place** conversion? `numpy.subtract` is returning a copy, isn't it? Only the name `a` reused for another chunk of data... Please explain, if I am wrong about this. – koffein Nov 17 '17 at 20:08
  • Thank you for pointing this out, it seems you are correct - a copy is produced. – MIO Nov 18 '17 at 19:32