Pandas every nth row

Question

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?

score 386 · Accepted Answer · edited Nov 20 '20 at 08:02

386

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

edited Nov 20 '20 at 08:02

Hawklaz

306
4
20

answered Jul 31 '14 at 11:25

chrisb

49,833
8
70
70

95

For those who might want, for example, every fifth row, but starting at the 2nd row it would be `df.iloc[1::5, :]`. – Little Bobby Tables Nov 13 '16 at 17:18
59

You can omit the column part: `df.iloc[::5]` – joctee Dec 28 '18 at 14:24
1

@chrisb how do I specify the starting row ? like every 5 row, starting from the second row ? – FabioSpaghetti Jan 13 '20 at 13:32
2

How do you include it from the back? – WJA Apr 20 '21 at 22:58
1

how do you make it not include 0th row? – Raksha Jun 10 '21 at 21:49
What is this slicing syntax called and where can I read more about it? – topher217 Jul 19 '21 at 10:45
1

This is standard Python slicing. See https://stackoverflow.com/questions/509211/understanding-slice-notation – David Parks Dec 15 '21 at 21:54
For every 3rd row it will be unintuitive `df.iloc[2::3]` – banderlog013 Jan 22 '22 at 14:16
2

@banderlog013 No, that's intuitive - just `df.iloc[::3]` would suffice. What you want ("intuitively") is to the first row in selection to not be the first row in the dataframe. It's not hard to see that for any given N ("give me N rows starting with the naturally-counted Nth row") the indexing is `df.iloc[(N-1)::N]`. This behavior is rarely needed, however... – Lodinn Feb 15 '22 at 13:07

metastableB · Answer 2 · 2018-06-28T13:18:16.907

53

Though @chrisb's accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.

edited Jun 28 '18 at 13:18

answered Sep 10 '17 at 13:22

metastableB

772
6
8

11

this is not a good answer because makes three assumptions, which are frequently not met: (1) the index is numeric (2) the index it starts at zero (3) the index values are consecutive ... the last one is especially important since you can't use your suggested method more than once without resetting the index – Constantine Jun 27 '18 at 15:12
3

I take your point. Will edit the answer to make the assumptions _more explicit_. – metastableB Jun 28 '18 at 13:14
2

@Constantine still, wouldn't that be faster than the other solution as you can simply add an index? – Readler May 31 '19 at 08:38

score 13 · Answer 3 · answered Jan 25 '19 at 04:22

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There's also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

score 8 · Answer 4 · answered Jun 16 '21 at 21:05

Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.

df1 = df[df.reset_index().index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0]  # Selects every 3rd row starting from 0

df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.

score 2 · Answer 5 · answered Dec 08 '18 at 05:00

2

I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]

answered Dec 08 '18 at 05:00

Steztric

2,832
2
24
43

score 0 · Answer 6 · answered Sep 22 '20 at 18:26

A solution I came up with when using the index was not viable ( possibly the multi-Gig .csv was too large, or I missed some technique that would allow me to reindex without crashing ).
Walk through one row at a time and add the nth row to a new dataframe.

import pandas as pd
from csv import DictReader

def make_downsampled_df(filename, interval):    
    with open(filename, 'r') as read_obj:
        csv_dict_reader = DictReader(read_obj)
        column_names = csv_dict_reader.fieldnames
        df = pd.DataFrame(columns=column_names)
    
        for index, row in enumerate(csv_dict_reader):
            if index % interval == 0:
               print(str(row))
               df = df.append(row, ignore_index=True)

    return df

score 0 · Answer 7 · answered Jan 17 '21 at 21:11

0

df.drop(labels=df[df.index % 3 != 0].index, axis=0) #  every 3rd row (mod 3)

answered Jan 17 '21 at 21:11

bitbang

1,804
14
18

4

While this code may answer the question, [including an explanation](https://meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers) of how or why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – ppwater Jan 18 '21 at 00:13

Pandas every nth row

7 Answers7

Linked

Related