2

I try to add a row to a numpy.array within a loop and it's not working although I don't get any error. My general aim is to compare two files and create a third file summarizing the comparison.

ipython

import numpy as np

my arrays

aList1=np.array([['A','we'],['A','we'],['B','we'],['C','de']])
aList2=np.array([['A'],['B'],['D']])
aResult=np.array(['row1','occurence'])

my function

def coverageA(array,file1,name1,colum1,file2,name2,colum2):
     x=file1[1:,colum1]
     y=file2[1:,colum2]
     for f in x:
         if f in y:
             array=np.vstack((array,np.array([f,'shared'])))
         else:
             array=np.vstack((array,np.array([f,name1])))
     for f in y:
         if f not in x:
             array=np.vstack((array,np.array([f,name2])))
     return

and use it this way

coverageA(aResult,alist1,'list1', 0,aList2,'list',0)

but aResult didn't change

print(aResult) output:(['row1','occurence'])

wanted

([['row1','occurence'],['A', 'shared'],['B', 'shared'],['C','list1'],['D','list2']])

ayhan
  • 70,170
  • 20
  • 182
  • 203
Sins
  • 21
  • 1

1 Answers1

1

repaired:

import numpy as np

#my arrays

aList1=np.array([['A','we'],['A','we'],['B','we'],['C','de']])
aList2=np.array([['A'],['B'],['D']])
aResult=np.array(['row1','occurence'])

#my function

def coverageA(array,file1,name1,colum1,file2,name2,colum2):
    x=file1[1:,colum1]
    y=file2[1:,colum2]
    for f in x:
        if f in y:
            array=np.vstack((array,np.array([f,'shared'])))
        else:
            array=np.vstack((array,np.array([f,name1])))
    for f in y:
        if f not in x:
            array=np.vstack((array,np.array([f,name2])))
    print(array)
    return array

#and use it this way

aResult=coverageA(aResult,aList1,'list1', 0,aList2,'list2',0)
#but aResult didn't change

print(aResult) 
#output:(['row1','occurence'])
#wanted

#([['row1','occurence'],['A', 'shared'],['B', 'shared'],['C','list1'],['D','list2']])

The explanation is, that in python arguments are passed by assignment, which is explained nicely here. In the line array=np.vstack((array,np.array([f,'shared']))) a new numpy array is created at a new possition im memory (array points to this), but aResult still points to its old position. You can check the memory adresses with print(id(array)).

Community
  • 1
  • 1
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58