21

I am trying to extract rows from a Pandas dataframe using a list of row names, but it can't be done. Here is an example

# df
    alleles  chrom  pos strand  assembly#  center  protLSID  assayLSID  
rs#
TP3      A/C      0    3      +        NaN     NaN       NaN        NaN
TP7      A/T      0    7      +        NaN     NaN       NaN        NaN
TP12     T/A      0   12      +        NaN     NaN       NaN        NaN
TP15     C/A      0   15      +        NaN     NaN       NaN        NaN
TP18     C/T      0   18      +        NaN     NaN       NaN        NaN

test = ['TP3','TP12','TP18']

df.select(test)

This is what I was trying to do with just element of the list and I am getting this error TypeError: 'Index' object is not callable. What am I doing wrong?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
upendra
  • 2,141
  • 9
  • 39
  • 64
  • `df.select()` is [deprecated in favor of `df.loc()` since 0.21](http://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.select.html#pandas.DataFrame.select), and also it was for selecting rows(/columns) based on a condition function. Not for simple indexing by list of (row/column-)names. – smci Dec 15 '17 at 19:13
  • Very similar to [How are iloc and loc different?](https://stackoverflow.com/questions/31593201/how-are-iloc-and-loc-different) – Trenton McKinney Aug 01 '21 at 02:27

3 Answers3

20

You can use df.loc[['TP3','TP12','TP18']]

Here is a small example:

In [26]: df = pd.DataFrame({"a": [1,2,3], "b": [3,4,5], "c": [5,6,7]})

In [27]: df.index = ["x", "y", "z"]

In [28]: df
Out[28]: 
   a  b  c
x  1  3  5
y  2  4  6
z  3  5  7

[3 rows x 3 columns]

In [29]: df.loc[["x", "y"]]
Out[29]: 
   a  b  c
x  1  3  5
y  2  4  6

[2 rows x 3 columns]
Akavall
  • 82,592
  • 51
  • 207
  • 251
0

There are at least 3 ways to access the element of of a pandas dataframe.

import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.uniform(size=(10,10)),columns= list('PQRSTUVWXY'),index= list("ABCDEFGHIJ"))

Using df[['P','Q']] you can only access the columns of the dataframe. You can use the dataframe.loc[] (stands for location) or dataframe.iloc[] (stands for index location) numpy style slicing of the dataframe.

df.loc[:,['P','Q']]

Above will give you columns named by 'P' and 'Q'.

df.loc[['A','B'],:]

Above will return rows with keys 'A' and 'B'.

You can also use number based slicing using iloc method.

df.iloc[:,[1,2]]

This will return columns numbered by 1 and 2. While,

df.iloc[[1,2],:]

will return rows 1st and 2nd. You can access any specific element by

df.iloc[1,2]

or,

df.loc['A','Q']
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Shrish
  • 109
  • 1
  • 4
-1

You can select the rows by position:

df.iloc[[0,2,4], :]
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
  • 2
    The OP asked about a list of row names, not numerical indices. In general `df.iloc()` and numeric indices are less preferred to row-names. – smci Dec 15 '17 at 19:14
  • 1
    @smci Yeah, that's a valid point. The OP did say "by name". You're right. – Joe T. Boka Dec 20 '17 at 07:40