I have an array where each row of data follows a sequential order, identified by a label column at the end. As a small example, its format is similar to this:
arr = [[1,2,3,1],
[2,3,4,1],
[3,4,5,1],
[4,5,6,2],
[5,6,7,2],
[7,8,9,2],
[9,10,11,3]]
I would like to split the array into groups using the label column as the group-by marker. So the above array would produce 3 arrays:
arrA = [[1,2,3,1],
[2,3,4,1],
[3,4,5,1]]
arrB = [[4,5,6,2],
[5,6,7,2],
[7,8,9,2]]
arrC = [9,10,11,3]
I currently have this FOR loop, storing each group array in a wins
list:
wins = []
for w in range(1, arr[-1,3]+1):
wins.append(arr[arr[:, 3] == w, :])
This does the job okay but I have several large datasets to process so is there a vectorized way of doing this, maybe by using diff()
or where()
from the numpy library?