3

I create a DataFrame:

import pandas as pd
data = pd.DataFrame({'a':range(1,11),'b':['m','f','m','m','m','f','m','f','f','f'],'c':np.random.randn(10)})

Which looks like:

    a  b         c
0   1  m  0.495439
1   2  f  1.444694
2   3  m  0.150637
3   4  m -1.078252
4   5  m  0.618045
5   6  f -0.525368
6   7  m  0.188912
7   8  f  0.159014
8   9  f  0.536495
9  10  f  0.874598

When I want to select some rows, I run

data[:2] or data.ix[2]

But when I try:

se = range(2)
data[se]

There's a error:

KeyError: 'No column(s) named: [0 1]'

I know DataFrame select a col as default.What happened when I run data[se]? How colon(:) works in python?

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
Peng He
  • 2,023
  • 5
  • 17
  • 24
  • 1
    Providing a list tries to select from the columns (and you do not have columns with name 0 or 1), while providing a slice like `:2` slices the rows – joris Dec 17 '15 at 08:44

3 Answers3

7

I have never used Pandas but a good explanation of slicing ([::] notation in python can be found here. Now from what I read in the manual

With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.

In [32]: df[:3]
Out[32]: 
                   A         B         C         D
2000-01-01 -0.282863  0.469112 -1.509059 -1.135632
2000-01-02 -0.173215  1.212112  0.119209 -1.044236
2000-01-03 -2.104569 -0.861849 -0.494929  1.071804

In [33]: df[::-1]
Out[33]: 
                   A         B         C         D
2000-01-08 -1.157892 -0.370647 -1.344312  0.844885
2000-01-07  0.577046  0.404705 -1.715002 -1.039268
2000-01-06  0.113648 -0.673690 -1.478427  0.524988
2000-01-05  0.567020 -0.424972  0.276232 -1.087401
2000-01-04 -0.706771  0.721555 -1.039575  0.271860
2000-01-03 -2.104569 -0.861849 -0.494929  1.071804
2000-01-02 -0.173215  1.212112  0.119209 -1.044236
2000-01-01 -0.282863  0.469112 -1.509059 -1.135632

In your example where you use range(2) that gives you [0, 1] as list. What I think you need is data[0:1] to slice the DataFrame and get rows 0 and 1 which is the same as data[:1] omitting the zero. If you wanted for example rows 3,4 and 5 that would be data[3:5].

Additionally, looking at some examples in the manual you can use step, so:

  • data[::2] gives you every 2nd row
  • data[::-1] returns all the rows in reverse order
  • Combining ranges and step: data[0:10:2] will result in rows 0,2,4,6,8 and 10

Hope it helps

Community
  • 1
  • 1
urban
  • 5,392
  • 3
  • 19
  • 45
  • Thanks.If I want to select these row [2,4,3,5,1],I run df[slice([2,4,3,5,1])],there is an error.I just used df.ix[2,4,3,5,1].Is there a different way to do that? – Peng He Dec 18 '15 at 03:14
  • Do you need them in that order? You can only do a single range with `[]` and cannot change order other than skip rows or reverse (as far as I know). Doing `df[1:5]` should give you the rows that you need but not in the order you want... – urban Dec 18 '15 at 08:37
  • However, when I do df[1:2] I only get row indexed 1, and not 1 and 2. What am I missing? [at least in other programming languages 1:2 means row (or column) 1 and 2. – Emmanuel Goldstein May 29 '21 at 11:44
2

The [start:limit:step] syntax is known as slicing. You can easily create an instance of a slice using the slice() function:

class slice(stop)

class slice(start, stop[, step])

Return a slice object representing the set of indices specified by range(start, stop, step). The start and step arguments default to None. Slice objects have read-only data attributes start, stop and step which merely return the argument values (or their default). They have no other explicit functionality; however they are used by Numerical Python and other third party extensions. Slice objects are also generated when extended indexing syntax is used. For example: a[start:stop:step] or a[start:stop, i]. See itertools.islice() for an alternate version that returns an iterator.

In your case, you could write something like this to return the first 2 rows

se = slice(None, 2)
data[se]
birdypme
  • 194
  • 1
  • 12
1
>>> data.ix[range(2)]
   a  b         c
0  1  m -0.323834
1  2  f  0.159787
Alexander
  • 105,104
  • 32
  • 201
  • 196