0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   CompPrice    400 non-null    int64 
 1   Income       400 non-null    int64 
 2   Advertising  400 non-null    int64 
 3   Population   400 non-null    int64 
 4   Price        400 non-null    int64 
 5   ShelveLoc    400 non-null    object
 6   Age          400 non-null    int64 
 7   Education    400 non-null    int64 
 8   Urban        400 non-null    object
 9   US           400 non-null    object
 10  HighSales    400 non-null    object
dtypes: int64(7), object(4)
memory usage: 34.5+ KB

As shown in the info() result above, there are 11 columns indexed from 0 to 10 in my dataset, DF. Now, I would like to extract only the first 10 columns (that are the columns with the indices 0 to 9). However, when I try to use the code below:

DF.iloc[:, 0:9]

It returns only the first 9 columns (that is, from CompPrice to Urban).

In this case, I need to change my code to:

DF.iloc[:, 0:10]

to get what I actually want (that is, from CompPrice to US).

I'm really confused by iloc() indices. Why it requires '10' instead '9' but starts with the index '0'. The starting and ending indices are not consistent.

Catherine
  • 83
  • 1
  • 6
  • From the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) - "`.iloc[]` is primarily integer position based (from `0` to `length-1` of the axis), but may also be used with a boolean array." – It_is_Chris Jun 01 '23 at 19:32
  • 3
    It is explicitly explained in the API: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html – deponovo Jun 01 '23 at 19:37
  • 3
    @deponovo Most explicitly under [Selection by position](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer): "When slicing, the start bound is *included*, while the upper bound is *excluded*." – wjandrea Jun 01 '23 at 19:39

3 Answers3

2

The short answer is because this is how Python indexing works. Pandas row selection with iloc is consistent with Python indexing. Consider the list:

lst = ['a', 'b', 'c', 'd', 'e', 'f']
  • lst[0:1] returns index 0 to 1-1: ['a']

  • lst[0:2] returns index 0 to 2-1: ['a', 'b']

  • lst[0:3] returns index 0 to 3-1: ['a', 'b', 'c']

  • [0:n] always returns index 0 to n-1.

Pandas behaves the same way.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
1

What you are observing is the standard functionality of pandas. If you look in the documentation, you can find the definition. This is intended and logical, as Python lists function the same way. As per the docs:

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics).

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

By convention, index ranges are [start, stop) meaning that the start index is included but the stop index is excluded.

range(1, 10) # returns 1, 2, 3, 4, 5, 6, 7, 8, 9 but not 10