0

I've been using pandas a lot lately, and have run across a slight impasse..

I have a pandas data structure, which is read in from a .fits file

d = fits.getdata('filename.fits')
df = pd.DataFrame(np.array(d))
df.columns = map(str.lower, df.columns)

containing column names like: 'n_ser_f2mf1_f850lp', 'n_ser_f3mf2_f850lp' , 'mtot_f2mf1_f850lp' , 'mtot_f3mf2_f850lp' , 'othergalaxycharacteristics_f3mf2_f8530lp'

(which if you're interested contains the difference in Sersic index for galaxies fit in a galaxy cluster that have been imaged by Hubble Space Telescope (using filter F850LP) in multiple fields --> f3mf2 meaning the galaxy is in field 3 and field 2, so we do valueinfield3 - valueinfield2)

Example of data structure/values:

a_df = pd.DataFrame(df_RXJ,columns=['global_id','mtot_f2mf1_f850lpser','n_ser_f2mf1_f850lp'])
print (a_df[285:290].head())

 global_id  mtot_f2mf1_f850lpser  n_ser_f2mf1_f850lp
 285      286.0              0.812901             -4.5086
 286      287.0              0.850700             -1.4044
 287      288.0                   NaN                 NaN
 288      289.0             -0.598200              2.1634
 289      290.0             -0.017500              0.3278

I want to use data contained in a column as a numpy array, usually I do this:

n_ser_residuals = df.n_ser_f2mf1_f850lp.values

Which results in an array:

length(array) = numberofgalaxies
array = [        nan,         nan,         nan, ...,  0.46969998,
    1.48409998,  0.08240002]

However, I am working with column names as strings (looping through different values like:

 for p in ['f3mf2, 'f2mf1', otheroverlappingfields]:
     col0name = 'n_ser_{}_f850lp'.format(p)
     col1name = 'mtot_{}_f850lp'.format(p)
     etc

So to access the values I use:

n_ser_residuals = (df[col0name].values)

Which instead results in an array of length 1 that looks like:

[array([        nan,         nan,         nan, ...,  0.46969998,
    1.48409998,  0.08240002], dtype=float32)]

Why does this method result in a different output? How can I turn this output into a list?

jazz mink
  • 13
  • 2

1 Answers1

0

Everything is working fine for me (pandas 0.18.1):

In [28]: col0name = 'n_ser_{}_f850lp'.format('f2mf1')

In [29]: col0name
Out[29]: 'n_ser_f2mf1_f850lp'

In [30]: df[col0name]
Out[30]:
285   -4.5086
286   -1.4044
287       NaN
288    2.1634
289    0.3278
Name: n_ser_f2mf1_f850lp, dtype: float64

In [31]: df[col0name].values
Out[31]: array([-4.5086, -1.4044,     nan,  2.1634,  0.3278])

In [32]: df[col0name].values[1]
Out[32]: -1.4044000000000001

In [33]: df[col0name].values[2]
Out[33]: nan

In [34]: df[col0name].values[1:5]
Out[34]: array([-1.4044,     nan,  2.1634,  0.3278])
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419