0

I have a Xmatrix of Row=12584 and Col 784. I want to extract each row based on another Tmatrix of Row=12584 Col 1 and append the values to numpy array X1 or X2. Even with smaller row size of 1500 it takes over 10 mins. I am sure there is better and efficient way to extract entire row and append to an array

import numpy as np
import time
start_time = time.time()

Row = 12584
#Row = 1500
Col = 784
Xmatrix = np.random.rand(Row,Col)

Tmatrix = np.random.randint(1,3,(Row,1))
X1 = np.array([])
X2 = np.array([])

for i in range(Row):
    if Tmatrix[i] == 1:
        for y in range(Col):
            print ('Current row and col are --', i, y, Xmatrix[i][y])
            X1 = np.append(X1, Xmatrix[i][y])
    else:
        for y in range(Col):
            X2 = np.append(X2, Xmatrix[i][y])

print (X1)
print("--- %s seconds ---" % (time.time() - start_time))
oneday
  • 629
  • 1
  • 9
  • 32
  • 1
    `alist.append(Xmatrix[i,y])` should be faster. But either way, iterating on rows and cols is slow. Even if you iterate on Row and do the test, you don't need to iterate on `Col`, `alist.extend(Xmatrix[i]` puts the whole row in the list at once. – hpaulj Sep 07 '19 at 20:00
  • @hpaulj - ur suggestion of extend with list is working out - if u could post it as answer I can go ahead and select it. – oneday Sep 07 '19 at 23:14

3 Answers3

2

You can drop iteration through columns for y in range(Col):, in numpy you can retrieve the whole row by:

Xmatrix[i, :]

and then append it by

X1=np.append(X1, [Xmatrix[i, :]], axis=0)

or alternatively:

X1=np.vstack([X1, Xmatrix[i, :]])

EDIT

To make appending work - first you need to create X1 and X2 in the proper shape parameters. In this case:

X1=np.empty(shape=(0, Col))
X2=np.empty(shape=(0, Col))
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • getting error with append - "ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)" and for vstack getting error "ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 784 – oneday Sep 07 '19 at 21:31
  • Small tweak - you need to create X1 and X2 in the predefined shape - see EDIT in my answer. – Grzegorz Skibinski Sep 07 '19 at 21:49
  • Still get below error - "ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 12584 and the array at index 1 has size 784 – oneday Sep 07 '19 at 23:02
  • Pardon! Now it should work - I used number of rows to create empty list, instead of number of columns... – Grzegorz Skibinski Sep 08 '19 at 07:39
2

try this:

import numpy as np
import time
start_time = time.time()

Row = 12584
#Row = 1500
Col = 784
Xmatrix = np.random.rand(Row,Col)

Tmatrix = np.random.randint(1,3,(Row,1))

X1 = Xmatrix[(Tmatrix==1).reshape(-1)]
X2 = Xmatrix[(Tmatrix==2).reshape(-1)]

print(X1.reshape(-1))

print(time.time() - start_time)

On my computer the program runs in 0.34 seconds. When using numpy it is good to avoid loops by indexing and slicing http://codeinpython.com/tutorials/numpy-array-indexing-slicing/

  • Can you please explain "X1 = Xmatrix[(Tmatrix==1).reshape(-1)]" what does it do .. too pythonic for me i guess – oneday Sep 07 '19 at 21:39
  • I will explain "X1 = Xmatrix[(Tmatrix==1).reshape(-1)]" "reshape(-1)" will flatten the array into an 1d-array "Xmatrix[Bool_Array]" returns the rows, where Bool_Array is True. See https://stackoverflow.com/questions/7994394/efficient-thresholding-filter-of-an-array-with-numpy and https://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arra – Reiner Czerwinski Sep 08 '19 at 03:03
1

With lists, this should be fairly efficient:

X1 =[]
X2 =[]    
for i in range(Row):
    if Tmatrix[i] == 1:
        X1.extend(Xmatrix[i])
    else:
        X2.extend(Xmatrix[i])

You can np.array(X1) after if needed.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • @hpaulji - ur solution was most intuitive to me but I see selected answer as best way to do it. Thanks for the help :) – oneday Sep 08 '19 at 14:26