Slice Pandas DataFrame by Row

Question

I am working with survey data loaded from an h5-file as hdf = pandas.HDFStore('Survey.h5') through the pandas package. Within this DataFrame, all rows are the results of a single survey, whereas the columns are the answers for all questions within a single survey.

I am aiming to reduce this dataset to a smaller DataFrame including only the rows with a certain depicted answer on a certain question, i.e. with all the same value in this column. I am able to determine the index values of all rows with this condition, but I can't find how to delete this rows or make a new df with these rows only.

score 50 · Accepted Answer · answered Aug 09 '12 at 11:26

50

In [36]: df
Out[36]:
   A  B  C  D
a  0  2  6  0
b  6  1  5  2
c  0  2  6  0
d  9  3  2  2

In [37]: rows
Out[37]: ['a', 'c']

In [38]: df.drop(rows)
Out[38]:
   A  B  C  D
b  6  1  5  2
d  9  3  2  2

In [39]: df[~((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))]
Out[39]:
   A  B  C  D
b  6  1  5  2
d  9  3  2  2

In [40]: df.ix[rows]
Out[40]:
   A  B  C  D
a  0  2  6  0
c  0  2  6  0

In [41]: df[((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))]
Out[41]:
   A  B  C  D
a  0  2  6  0
c  0  2  6  0

answered Aug 09 '12 at 11:26

Wouter Overmeire

65,766
10
63
43

is it possible to slice the dataframe and say (c = 5 or c =6) like THIS: ---> df[((df.A == 0) & (df.B == 2) & (df.C == 5 or 6) & (df.D == 0))] – yoshiserry Dec 05 '14 at 03:53
df[((df.A == 0) & (df.B == 2) & df.C.isin([5, 6]) & (df.D == 0))] or df[((df.A == 0) & (df.B == 2) & ((df.C == 5) | (df.C == 6)) & (df.D == 0))] – Wouter Overmeire Dec 05 '14 at 07:52

score 26 · Answer 2 · answered Dec 11 '16 at 03:14

26

If you already know the index you can use .loc:

In [12]: df = pd.DataFrame({"a": [1,2,3,4,5], "b": [4,5,6,7,8]})

In [13]: df
Out[13]:
   a  b
0  1  4
1  2  5
2  3  6
3  4  7
4  5  8

In [14]: df.loc[[0,2,4]]
Out[14]:
   a  b
0  1  4
2  3  6
4  5  8

In [15]: df.loc[1:3]
Out[15]:
   a  b
1  2  5
2  3  6
3  4  7

answered Dec 11 '16 at 03:14

Akavall

82,592
51
207
251

14

It's worth a quick note that despite the notational similarity between `df.loc[1:3]` and `some_list[1:3]`, the first uses an inclusive upper index while the second (and most of python) uses an exclusive upper index. – Phoenix Meadowlark Feb 21 '20 at 02:04

score 1 · Answer 3 · answered Mar 24 '22 at 12:07

1

If you just need to get the top rows; you can use df.head(10)

answered Mar 24 '22 at 12:07

Neil

7,482
6
50
56

score 1 · Answer 4 · answered Jun 10 '22 at 20:00

1

Use query to search for specific conditions:

In [3]: df
Out[3]: 
   age family   name
0    1      A   john 
1   36      A  jason 
2   32      A   jane 
3   26      B   jack 
4   30      B  james 

In [4]: df.query('age > 30 & family == "A"')
Out[4]: 
   age family   name
1   36      A  jason 
2   32      A   jane

answered Jun 10 '22 at 20:00

rachwa

1,805
1
14
17

After the query, how would I slice columns? For example, only display age and name. – akalanka Jun 01 '23 at 20:13
1

Hi, if you want to use [query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html), try `df.query('age > 30 & family == "A"')[['age', 'name']]`. Alternatively, you can use [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html): `df.loc[(df['age'] > 30) & (df['family'] == 'A'), ['age', 'name']]`. – rachwa Jun 02 '23 at 09:42

Slice Pandas DataFrame by Row

4 Answers4

Linked