7

I just started learning ML myself in python. I didn't understand a passage in the code and would be happy if you made it clear to me what he was saying. Plus I don't know what [:, -1] and [:,: - 1] do

inputs = training_data[:,:-1]
outputs = training_data[:, -1]
Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60
afek banyas
  • 103
  • 1
  • 1
  • 5
  • 1
    https://stackoverflow.com/a/509295/1011724 or just google slicing in numpy – Dan Sep 10 '19 at 14:19
  • 5
    This is not just about slicing, notice the comma... – Óscar López Sep 10 '19 at 14:21
  • The linked duplicate doesn't address a `,` in the slice, but that's simply enough; `__getitem__` receives a tuple of `slice` objects, rather than a single `slice` object. – chepner Sep 10 '19 at 14:22
  • @ÓscarLópez the comma just means one of the slice parameters is a tuple. – Mark Ransom Sep 10 '19 at 14:22
  • 3
    @MarkRansom No, it's a tuple of slices. – chepner Sep 10 '19 at 14:23
  • [This answer](https://stackoverflow.com/a/509377/1126841), at least, to the proposed duplicate addresses commas. – chepner Sep 10 '19 at 14:24
  • @chepner how does that parse? To get a slice you need both the opening and close brackets, don't you? – Mark Ransom Sep 10 '19 at 14:42
  • See https://docs.python.org/3/reference/grammar.html; the relevant non-terminals are `trailer`, `subscriptlist`, `subscript`, and `sliceop`. In a nutshell, a subscript is a comma-separate list of `:` expressions. – chepner Sep 10 '19 at 14:49
  • The trailer `[:, :-1]` generates two `slice` objects, `slice(None,None,None)` and `slice(None, -1, None)`. – chepner Sep 10 '19 at 14:51

2 Answers2

12

[:, :] literally means [all rows, all columns].

Indexing in python starts from 0 when you go from the first element to the last, but it starts from -1 when you start from the last element.

So, when you do [:, -1] it means you are taking all the rows and only the last column. -1 represents the last column.

When you do [:, :-1], it means you are taking all the rows and all the columns except the last column.

Now, when you do training_data[:, -1] it means from the dataframe training_date, you are using all the rows and only the last column. Similarly training_data[:, :-1] means all the rows and all the columns except the last column.

But:

You might run into a slicing problem if you do training_data[:, -1]. Since you are using integers to slice the df, it is always better to use the .iloc method.

This tutorial How do I select multiple rows and columns from a pandas DataFrame? explains everything clearly. Have a look at it.

example: example

Yash Zade
  • 163
  • 1
  • 4
some_programmer
  • 3,268
  • 4
  • 24
  • 59
-1

Check this out: https://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.indexing.html#basic-slicing-and-indexing

x = np.random.rand(3,2)
x
array([[0.55424444, 0.86283166],
       [0.11931308, 0.43853805],
       [0.13662337, 0.06383871]])
n = x[:, -1]
n
array([0.86283166, 0.43853805, 0.06383871])
PySeeker
  • 818
  • 8
  • 12