10

In my dataset I've close to 200 rows but for a minimal working e.g., let's assume the following array:

arr = np.array([[1,2,3,4], [5,6,7,8], 
               [9,10,11,12], [13,14,15,16], 
               [17,18,19,20], [21,22,23,24]])

I can take a random sampling of 3 of the rows as follows:

indexes = np.random.choice(np.arange(arr.shape[0]), int(arr.shape[0]/2), replace=False)

Using these indexes, I can select my test cases as follows:

testing = arr[indexes]

I want to delete the rows at these indexes and I can use the remaining elements for my training set.

From the post here, it seems that training = np.delete(arr, indexes) ought to do it. But I get 1d array instead.

I also tried the suggestion here using training = arr[indexes.astype(np.bool)] but it did not give a clean separation. I get element [5,6,7,8] in both the training and testing sets.

training = arr[indexes.astype(np.bool)]

testing
Out[101]: 
array([[13, 14, 15, 16],
       [ 5,  6,  7,  8],
       [17, 18, 19, 20]])

training
Out[102]: 
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Any idea what I am doing wrong? Thanks.

Community
  • 1
  • 1
sedeh
  • 7,083
  • 6
  • 48
  • 65
  • Don't forget to [read the docs](http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html); the documentation has the answer to your question. – user2357112 May 20 '15 at 05:10

2 Answers2

14

To delete indexed rows from numpy array:

arr = np.delete(arr, indexes, axis=0)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
farhawa
  • 10,120
  • 16
  • 49
  • 91
3

One approach would be to get the remaining row indices with np.setdiff1d and then use those row indices to get the desired output -

out = arr[np.setdiff1d(np.arange(arr.shape[0]), indexes)]

Or use np.in1d to leverage boolean indexing -

out = arr[~np.in1d(np.arange(arr.shape[0]), indexes)]
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I have no idea, but this worked for me over the np.delete(). I had a matrix (1000,17) and wanted to work with all but one row at each iteration through rows. The delete command sometimes gave me 998 and 999 as my output length rather than 999 every time. I thought it was a rounding error - and that the comparison wasn't happening well enough, but that wasn't it. I don't know enough about python to know why delete didn't work unfortunately. – ashley Feb 20 '17 at 15:57