1
# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.

Thank you

It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
  • 1
    In any slicing reference, -n refers to the nth last term. So -1 is the last term, -2 is second last term and so on – Galo do Leste Feb 07 '23 at 05:04
  • This is typical ML data. The last column is the labels,`y` 1d. The rest is the 2d data, `X`, samples by features (all columns except the last) – hpaulj Feb 07 '23 at 05:43
  • 1
    Standard python slice notation, `start:stop:step`; `-n` counts from end. So `:-1` is 'all but last'. – hpaulj Feb 07 '23 at 06:06
  • Does this answer your question? [Understanding slicing](https://stackoverflow.com/questions/509211/understanding-slicing) – Pranav Hosangadi Feb 07 '23 at 23:45

2 Answers2

1

Per the docs:

Indexing on ndarrays

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.

1D array

Slicing a 1-dimensional array is much like slicing a list

import numpy as np


np.random.seed(0)
array_1d = np.random.random((5,))

print(len(array_1d.shape))
1

NOTE: The len of the array shape tells you the number of dimensions.

We can use standard python list slicing on the 1D array.

# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135  0.71518937 0.60276338 0.54488318]

2D array

array_2d = np.random.random((5, 1))

print(len(array_2d.shape))
2

Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).

# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
 [0.56804456]
 [0.92559664]
 [0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]

Your Example

# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Note that data.shape is >= 2 (otherwise you'd get an IndexError).

This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".

It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.

print(len(array_2d[1:, :-1].shape))  # 2

But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.

print(len(array_2d[1, :-1].shape))  # 1, because I selected a single index value on the 0th axis

print(len(array_2d[1, -1].shape))  # 0, because I selected a single index value on both the 0th and 1st axes

You can, however, select a list of indices on either axis (assuming they exist).

print(len(array_2d[[1], [-1]].shape))  # 1

print(len(array_2d[[1, 3], :].shape))  # 2
Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
0

This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings

-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)

Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

stunlocked
  • 181
  • 10