18

Very simply put,

For the same training data frame df, when I use X = df.iloc[:, :-1].values, it will select till the second last column of the data frame instead of the last column (which is what I want BUT it's a strange behavior I've never seen before), and I know this as the second last column's value and the last column's value for that row is different.

However, using y = df.iloc[:, -1].values gives me the row vector of the last column's values which is exactly what I want.

Why is the negative 1 for X giving me the second last column's value instead?

Error

Fizik26
  • 753
  • 2
  • 10
  • 25
kwotsin
  • 2,882
  • 9
  • 35
  • 62
  • 1
    up until the last column but not including the last column, since python ranges / slices do not include the end point... I do not understand what you are expecting instead... – Tadhg McDonald-Jensen May 29 '16 at 16:17
  • 1
    like, `data = [1,2,3,4,5]` then a slice up to the last element `data[:-1] -> [1,2,3,4]` would remove the last one because the end point is the last element and slices never include the endpoint... This is exactly the intended behaviour. – Tadhg McDonald-Jensen May 29 '16 at 16:18
  • 3
    `df.iloc[:, 2]` selects the second column but `df.iloc[:, :2]` or explicitly `df.iloc[:, 0:2]` selects the columns until (excluding) the second column. It's the same as Python's slices. When you use a negative index, nothing changes. If you say `df.iloc[:, -1]` it means the last column, but `df.iloc[:, :-1]` means until the last column. – ayhan May 29 '16 at 16:19
  • Oh yes I see...I misunderstood -1 as always selecting the last column. – kwotsin May 29 '16 at 16:21
  • @leeks50996 `-1` indice **does** always mean "last element" but in slices the endpoint is excluded. This is the same behaviour for positive indices for example: `data = "abcde"` indice `2` refers to `"c"` and `data[:2]` will be everything up to but not including `c` so `data[:2] -> "ab"` – Tadhg McDonald-Jensen May 29 '16 at 16:24

5 Answers5

21

I think you have only two columns in df, because if there is more columns, iloc select all columns without last:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

print(df.iloc[:, :-1])
   A  B  C  D  E
0  1  4  7  1  5
1  2  5  8  3  3
2  3  6  9  5  6

X = df.iloc[:, :-1].values
print (X)
[[1 4 7 1 5]
 [2 5 8 3 3]
 [3 6 9 5 6]]

print (X.shape)
(3, 5)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • That's strange, because my df originally has 24 columns as displayed in df.shape (printed in the screenshot). – kwotsin May 29 '16 at 16:16
  • And what return `print df.columns` ? – jezrael May 29 '16 at 16:17
  • It gives me an array of [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], as I read df with header = None – kwotsin May 29 '16 at 16:19
  • Hmm, but screenshot return `X.shape = (670294, 23)` it means all columns without last. – jezrael May 29 '16 at 16:22
  • hmm yes I just realised the -1 doesn't always mean 'the last column'; in the context of slices it actually means 'until before the last column'. – kwotsin May 29 '16 at 16:23
4

Just for clarity

With respect to python syntax, this question has been answered here.

Python list slicing syntax states that for a:b it will get a and everything upto but not including b. a: will get a and everything after it. :b will get everything before b but not b. The list index of -1 refers to the last element. :-1 adheres to the same standards as above in that this gets everything before the last element but not the last element. If you want the last element included use :.

Community
  • 1
  • 1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Bcz Upper bound is exclusive. Its similar to slicing a list:

a=[1,2,3,4]

a[:3]

will result in [1, 2, 3]. It did not take the last element.

Manoj Kumar
  • 176
  • 7
1

In case you learn something from this

# Single selections using iloc and DataFrame
# Rows:
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.
data.iloc[1] # second row of data frame (Evan Zigomalas)
data.iloc[-1] # last row of data frame (Mi Richan)
# Columns:
data.iloc[:,0] # first column of data frame (first_name)
data.iloc[:,1] # second column of data frame (last_name)
data.iloc[:,-1] # last column of data frame (id)
0

Consider list l containing the following elements:

Index 0 1 2 3 4 5

Values a b c d e f

Index -6 -5 -4 -3 -2 -1

if you print :

l[:]--> a b c d e f

But for

l[:-1]-->a b c d e

This happens because in:

l[:]-->l[start : end]--> default value start=0 end=6

l[:-1]-->l[start : end]--> default value start=0 end=-1

Considering negative indexing you will get:

l[:-1]-->a b c d e

It will include start and exclude end while printing

VivekP
  • 11
  • 2