3

I need to build two array so that every element is a 365-element array (separated in 30/31-element individual arrays) representing a calendar; the first array should contain the number of airplane flights that happen in a single day, and the second should contain the number of delayed flights in each day; every row in both arrays is supposed to contain such data for ONE specific Travel Agency.

I declared the arrays using the following code:

year = np.array([[0]*31,[0]*29,[0]*31,[0]*30,[0]*31,[0]*30,[0]*31,[0]*31,[0]*30,[0]*31,[0]*30,[0]*31,])
flights_1998_Carriers = np.array([year,]*14)
delays_1998_Carriers = np.array([year,]*14)

But whenever I increase the value of ONE specific day for a specific agency in one of the two arrays, the same day for EVERY agency in BOTH arrays is increased as well. For example:

print(flights_1998_Carriers[0][0][1])
print(delays_1998_Carriers[0][0][1])
flights_1998_Carriers[0][0][1] = flights_1998_Carriers[0][0][1] + 1
print(flights_1998_Carriers[0][0][1])
print(flights_1998_Carriers[1][0][1])
print(delays_1998_Carriers[0][0][1])

Would print the result:

0
0
1
1
1

I have tried everything I could think of but I can't understand WHY all the rows in both arrays are linked in such a way. Does anyone know where I'm messing this up? Thanks.

Tigrerojo
  • 43
  • 4
  • 1
    First I want to say, mutating either of the arrays this way is not just altering the other, it's actually mutating the `year` array. Maybe [this resource](https://docs.python.org/3/library/copy.html) will help, but I'm not actually super good with Python so my apologies if not. – Andrew Nov 13 '19 at 19:41
  • Also! Maybe [this question](https://stackoverflow.com/questions/35910577/why-do-python-numpy-mutate-the-original-array) is relevant? – Andrew Nov 13 '19 at 19:52
  • Replacing "np.array([year,]*14)" with "np.array([copy.deepcopy(year),]*14)" DOES make is so modifying an array doesn't affect the other, but all the rows in the first arrays are STILL affected; I'm gonna keep trying to find a solution using copying functions. Thanks! – Tigrerojo Nov 13 '19 at 20:05

1 Answers1

1
In [548]: year.shape                                                            
Out[548]: (12,)
In [549]: year.dtype                                                            
Out[549]: dtype('O')
In [550]: flights_1998_Carriers.shape                                           
Out[550]: (14, 12)

Also do a print of year. You'll see it is an object dtype array of lists.

In [560]: flights_1998_Carriers.shape                                           
Out[560]: (14, 12)

Because of the [year,]*14 construction, all elements in flights_1998_Carriers[:,0] reference the same list, which is also year[0]. If you modify an element of year[0], you'll see that change in flights... as well. And delays... as well.

An object dtype array like year is essentially a list of lists, or a list of references to lists (stored else where in memory). And when you make a new array with that list multiply syntax, you get the same duplicate issues that you would get with lists.

np.array([copy.deepcopy(year),]*14)

separates this array from year, but still replicates the references 14x. You need to copy 14 times.

np.array([copy.deepcopy(year) for _ in range(14)])

Working with object arrays like this doesn't give much of an advantage over plain lists. Operations might even be slower. They certainly aren't fast like numeric numpy arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks! That worked like a charm. I guess I need to learn more about how python arrays are initialized and linked... – Tigrerojo Nov 13 '19 at 22:06