Dataframe.resample()
works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?
7 Answers
I'd use iloc
, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:
df.iloc[::5, :]
-
95For those who might want, for example, every fifth row, but starting at the 2nd row it would be `df.iloc[1::5, :]`. – Little Bobby Tables Nov 13 '16 at 17:18
-
59You can omit the column part: `df.iloc[::5]` – joctee Dec 28 '18 at 14:24
-
1@chrisb how do I specify the starting row ? like every 5 row, starting from the second row ? – FabioSpaghetti Jan 13 '20 at 13:32
-
2How do you include it from the back? – WJA Apr 20 '21 at 22:58
-
1how do you make it not include 0th row? – Raksha Jun 10 '21 at 21:49
-
What is this slicing syntax called and where can I read more about it? – topher217 Jul 19 '21 at 10:45
-
1This is standard Python slicing. See https://stackoverflow.com/questions/509211/understanding-slice-notation – David Parks Dec 15 '21 at 21:54
-
For every 3rd row it will be unintuitive `df.iloc[2::3]` – banderlog013 Jan 22 '22 at 14:16
-
2@banderlog013 No, that's intuitive - just `df.iloc[::3]` would suffice. What you want ("intuitively") is to the first row in selection to not be the first row in the dataframe. It's not hard to see that for any given N ("give me N rows starting with the naturally-counted Nth row") the indexing is `df.iloc[(N-1)::N]`. This behavior is rarely needed, however... – Lodinn Feb 15 '22 at 13:07
Though @chrisb's accepted answer does answer the question, I would like to add to it the following.
A simple method I use to get the nth
data or drop the nth
row is the following:
df1 = df[df.index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0] # Selects every 3rd raw starting from 0
This arithmetic based sampling has the ability to enable even more complex row-selections.
This assumes, of course, that you have an index
column of ordered, consecutive, integers starting at 0.

- 772
- 6
- 8
-
11this is not a good answer because makes three assumptions, which are frequently not met: (1) the index is numeric (2) the index it starts at zero (3) the index values are consecutive ... the last one is especially important since you can't use your suggested method more than once without resetting the index – Constantine Jun 27 '18 at 15:12
-
3I take your point. Will edit the answer to make the assumptions _more explicit_. – metastableB Jun 28 '18 at 13:14
-
2@Constantine still, wouldn't that be faster than the other solution as you can simply add an index? – Readler May 31 '19 at 08:38
There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__
.
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
For example, to get every 2 rows, you can do
df[::2]
a b c
0 x x x
2 x x x
4 x x x
There's also GroupBy.first
/GroupBy.head
, you group on the index:
df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')
df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)
a b c
0 x x x
1 x x x
2 x x x
The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do
# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()
a b c
0 x x x
1 x x x
2 x x x

- 379,657
- 97
- 704
- 746
Adding reset_index()
to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.
df1 = df[df.reset_index().index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0] # Selects every 3rd row starting from 0
df.reset_index().index
will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.

- 87
- 1
- 6
I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.
groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]

- 2,832
- 2
- 24
- 43
A solution I came up with when using the index was not viable ( possibly the multi-Gig .csv was too large, or I missed some technique that would allow me to reindex without crashing ).
Walk through one row at a time and add the nth row to a new dataframe.
import pandas as pd
from csv import DictReader
def make_downsampled_df(filename, interval):
with open(filename, 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
column_names = csv_dict_reader.fieldnames
df = pd.DataFrame(columns=column_names)
for index, row in enumerate(csv_dict_reader):
if index % interval == 0:
print(str(row))
df = df.append(row, ignore_index=True)
return df

- 23
- 3
df.drop(labels=df[df.index % 3 != 0].index, axis=0) # every 3rd row (mod 3)

- 1,804
- 14
- 18
-
4While this code may answer the question, [including an explanation](https://meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers) of how or why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – ppwater Jan 18 '21 at 00:13