0

I'm using a numpy array with Python and I would like to know how I can add a new column at the end of my array?

I have an array with N rows and I calculate for each row a new value which is named X. I would like, for each row, to add this new value in a new column.

My script is (the interesting part is at the end of my script) :

#!/usr/bin/python
# coding: utf-8

from astropy.io import fits
import numpy as np
#import matplotlib.pyplot as plt
import math


        #########################################
        # Fichier contenant la liste des champs #
        #########################################


with open("liste_essai.txt", "r") as f :

    fichier_entier = f.read()
    files = fichier_entier.split("\n")

for fichier in files :

    with open(fichier, 'r') :

        reading = fits.open(fichier)          # Ouverture du fichier à l'aide d'astropy

        tbdata = reading[1].data               # Lecture des données fits


        #######################################################
        # Application du tri en fonction de divers paramètres #
        #######################################################

        #mask1 = tbdata['CHI'] < 1.0        # Création d'un masque pour la condition CHI
        #tbdata_temp1 = tbdata[mask1]

        #print "Tri effectué sur CHI"

        #mask2 = tbdata_temp1['PROB'] > 0.01    # Création d'un second masque sur la condition PROB
        #tbdata_temp2 = tbdata_temp1[mask2]

        #print "Tri effectué sur PROB"

        #mask3 = tbdata_temp2['SHARP'] > -0.4   # Création d'un 3e masque sur la condition SHARP (1/2)
        #tbdata_temp3 = tbdata_temp2[mask3]

        #mask4 = tbdata_temp3['SHARP'] < 0.1    # Création d'un 4e masque sur la condition SHARP (2/2)
        #tbdata_final = tbdata_temp3[mask4]

        #print "Création de la nouvelle table finale"
        #print tbdata_final         # Affichage de la table après toutes les conditions

        #fig = plt.figure()
        #plt.plot(tbdata_final['G'] - tbdata_final['R'], tbdata_final['G'], '.')
        #plt.title('Diagramme Couleur-Magnitude')
        #plt.xlabel('(g-r)')
        #plt.ylabel('g')
        #plt.xlim(-2,2)
        #plt.ylim(15,26)
        #plt.gca().invert_yaxis()
        #plt.show()
        #fig.savefig()

        #print "Création du Diagramme"

        #hdu = fits.BinTableHDU(data=tbdata_final)
        #hdu.writeto('{}_{}'.format(fichier,'traité'))      # Ecriture du résultat obtenu dans un nouveau fichier fits

        #print "Ecriture du nouveau fichier traité"

        #################################################
        # Détermination des valeurs extremales du champ #
        #################################################

        RA_max = np.max(tbdata['RA'])
        RA_min = np.min(tbdata['RA'])
        #print "RA_max vaut :     " + str(RA_max)
        #print "RA_min vaut :     " + str(RA_min)

        DEC_max = np.max(tbdata['DEC'])
        DEC_min = np.min(tbdata['DEC'])
        #print "DEC_max vaut :   " + str(DEC_max)
        #print "DEC_min vaut :   " + str(DEC_min)

        #########################################
        # Calcul de la valeur centrale du champ #
        #########################################

        RA_central = (RA_max + RA_min)/2.
        DEC_central = (DEC_max + DEC_min)/2.

        #print "RA_central vaut : " + str(RA_central)
        #print "DEC_central vaut : " + str(DEC_central)

        print " "
        print " ######################################### "

    ##############################
    # Détermination de X et de Y #
    ##############################

        i = 0
        N = len(tbdata)

        for i in range(0,N) :

            print "Valeur de RA à la ligne " + str(i) + " est : " + str(tbdata['RA'][i])
            print "Valeur de RA_moyen est : " + str(RA_central)
            print "Valeur de DEC_moyen est : " + str(DEC_central)

            X = (tbdata['RA'][i] - RA_central)*math.cos(DEC_central)

            Add_column = np.vstack(tbdata, X) # ==> ????

            print "La valeur de X est : " + str(X)
            print " "

I tried something but I'm not sure that's working.

And I've a second question if it's possible. In the plot part, I would like to save my plot for each file but with the name of each file. I think that I need to write something like :

plt.savefig('graph',"{}_{}".format(fichier,png))
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • Check out [this](http://stackoverflow.com/questions/15815854/how-to-add-column-to-numpy-array) question and the documentation for joining arrays [here](http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html#joining-arrays) – bunji Mar 05 '16 at 19:17
  • Sorry but I'm not english, american or whatelse... –  Mar 05 '16 at 21:28

1 Answers1

1

Numpy arrays are always going to be stored in a continuous memory block, that means that once you've created it, making it any bigger will mean numpy will have to copy the original array to make sure that the addition will be beside the original array in memory.
If you have a general idea of how many columns you will be adding, you can create the original array with additional columns of zeros. This will reserve the space in memory for your array and then you can "add" columns by overwriting the left-most column of zeros.
If you have the memory to spare you can always over-estimate the number of columns you will need and then remove extra columns of zeros later on. As far as I know this is the only way to avoid copying when adding new columns to a numpy array.

For example:

my_array = np.random.rand(200,3)  # the original array
zeros = np.zeros((200,400))   # anticipates 400 additional columns

my_array = np.hstack((my_array,zeros)) # join my_array with the array of zeros (only this step will make a copy)

current_column = 3  # keeps track of left most column of zeros

new_columns = []  # put list of new columns to add here 

for col in new_columns:
    my_array[:,current_column] = col
    current_column += 1 
bunji
  • 5,063
  • 1
  • 17
  • 36