Sampling a dataframe by selecting rows where the location modulo P = Q

Question

Let's say I have a dataframe with N rows. I want to pick the rows where the row location modulo P gives Q. So for concreteness, let's say P = 7 and Q = 5.

Row 0: 0 mod 7 = 0 (not satisfied)
Row 1: 1 mod 7 = 1 (not satisfied)
...
Row 5: 5 mod 7 = 5 (satisfied)
...
Row 12: 12 mod 7 = 5 (satisfied)

So the rows that are selected will be 5, 12, 19, 26 ....

If Q=0, you can use the slicing method df.iloc[::P]. How does one do it for mod P = Q?

related: https://stackoverflow.com/questions/509211/understanding-slice-notation which totally applies to pandas as well. — Tadhg McDonald-Jensen, Aug 08 '20 at 19:02

score 5 · Accepted Answer · answered Aug 08 '20 at 18:28

df.iloc[Q::P] this indicates start at row Q then step in increments of P.

When the first argument isn't given like .iloc[::P] it is implicitly 0 (and the middle one is implicitly end of data frame), you can just specify it to be something other than 0 if that is what you need.

score 0 · Answer 2 · answered Aug 08 '20 at 18:31

0

Using the numpy package:

 import numpy as np

    #instantiate new col
    df["satisfied"] = 0
    
    #fill new col based on modulus condition
    df.satisfied = np.where(df.index % P == Q, "(satisfied)", "(not satisfied)")

answered Aug 08 '20 at 18:31

ldren

159
5

score 0 · Answer 3 · answered Aug 08 '20 at 18:36

code:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(100).reshape(25,4), columns = ['A','B','C','D'])
p = 7
q = 5
a = []

#piece of code for getting the p%7 value and appending in a list
for i in range(df.shape[0]):
    if i%p == q:
        a.append(i)

#printing the p%7 values
print(df.iloc[a,:])

Output:

================
     A   B   C   D
5   20  21  22  23
12  48  49  50  51
19  76  77  78  79

Sampling a dataframe by selecting rows where the location modulo P = Q

3 Answers3