4

In previous posts I've seen that changing dtype of a recarray can be performed using astype. However I cannot manage to do it with a recarray which has an array in one of its columns.

My recarray comes from a FITS file record:

> f = fits.open('myfile.fits')   
> tbdata = f[1].data
> tbdata
# FITS_rec([ (0.27591679999999996, array([570, 576, 566, ..., 571, 571, 569], dtype=int16)),
#   (0.55175680000000005, array([575, 563, 565, ..., 572, 577, 582], dtype=int16)),
#   ...,
#   (2999.2083967999997, array([574, 570, 575, ..., 560, 551, 555], dtype=int16)),
#   (2999.4842367999995, array([575, 583, 578, ..., 559, 565, 568], dtype=int16)], 
#   dtype=[('TIME', '>f8'), ('AC', '>i4', (2,))])

I need to convert AC column from int to float so I've tried:

> tbdata = tbdata.astype([('TIME', '>f8'), ('AC', '>f4', (2,))])

and, although it seems that dtype has indeed changed

> tbdata.dtype
# dtype([('TIME', '>f8'), ('AC', '>f4', (2,))])

a look to the data in AC shows that they are still integer values. For instance, a sum calculation reaches the limits of the int16 variable (all the AC column values are positive):

> tbdata['AC'][0:55].sum()
# _VLF(array([31112, 31128, 31164, ..., 31203, 31232, 31262], dtype=int16), dtype=object)
> tbdata['AC'][0:65].sum()
# _VLF(array([-28766, -28759, -28702, ..., -28659, -28638, -28583], dtype=int16), dtype=object)

Is there any way to effectively change the array data type?

Community
  • 1
  • 1
mtc
  • 43
  • 1
  • 4
  • Not an answer, just curious: According to the dtype, the 'AC' field is an array with shape (2,). Why does the commented output show this field as having many more elements? E.g. `array([570, 576, 566, ..., 571, 571, 569], dtype=int16)` – Warren Weckesser Dec 17 '15 at 18:28
  • I can't reproduce this, but I'm not using your FITS library. A self-contained example that we can copy and run would be helpful. Don't use the FITS data; just create a simple dtype and array "by hand" that can be used to demonstrate the problem. – Warren Weckesser Dec 17 '15 at 18:31
  • @WarrenWeckesser : for the first question, I'm not sure, but I guess this is something related to the fact that the 'AC' field is a FITS variable length array... – mtc Dec 18 '15 at 09:33

2 Answers2

0

Following Warren advice, if I try with a recarray created "by hand", things seem to go well:

> ra = np.array([ ([30000,10000], 1), ([30000,20000],2),([30000,30000],3) ], dtype=[('x', 'int16',2), ('y', int)])
> ra
# array([([30000, 10000], 1), ([30000, 20000], 2), ([30000, 30000], 3)],
#       dtype=[('x', '<i2', (2,)), ('y', '<i8')])
> ra = ra.astype([('x', '<f4', (2,)), ('y', '<i8')])
> ra
# array([([30000.0, 10000.0], 1), ([30000.0, 20000.0], 2),
#        ([30000.0, 30000.0], 3)], dtype=[('x', '<f4', (2,)), ('y', '<i8')])

So, int16 numbers are converted to float numbers.

However, after the astype call to my tbdata recarray, numbers do not seem to change at all (nor the internal dtype):

> tbdata.dtype
# dtype([('TIME', '>f8'), ('AC', '>f4', (2,))])
> tbdata
# FITS_rec([ (0.27591679999999996, array([570, 576, 566, ..., 571, 571, 569], dtype=int16)),
#    (0.55175680000000005, array([575, 563, 565, ..., 572, 577, 582], dtype=int16)),
#   ...,
#   (2999.2083967999997, array([574, 570, 575, ..., 560, 551, 555], dtype=int16)),
#   (2999.4842367999995, array([575, 583, 578, ..., 559, 565, 568], dtype=int16))], 
#    dtype=[('TIME', '>f8'), ('ADC', '<f4', (2,))])

My conclusion is that this can be a problem related to the AstroPy interface to FITS files. In addition, the negative numbers that I retrieve after the sum() are in fact not related to the datatype, but they are present in the middle of the integer array in tbdata, due to the way FITS stores numbers greater than 32768, using the TZERO keyword for the offset of unsigned integers. The problem is that CFITSIO and normal FITS viewers reconvert these numbers in a transparent way for the user, and thus I was not aware of these negative numbers. Thanks a lot for help and suggestions.

mtc
  • 43
  • 1
  • 4
0

I can reproduce this issue with a recarray from a fits file. A workaround is to load the recarray as a fits table, and then transform it into a pandas dataframe:

from astropy.table import Table
import pandas as pd

t = Table.read('file.fits')
df = pd.DataFrame.from_records(t, columns=t.columns) 
df.AC = df.AC.astype(float)
VinceP
  • 2,058
  • 2
  • 19
  • 29