How to split dataframe in half by keeping 1 record and drop another record on all records?

Question

I have a dataframe with over 200k records.

I wish to slim my dataframe down by half, by dropping one record for each that I keep (as is displayed below).

Keep Row 1,
Drop Row 2,
Keep Row 3,
Drop Row 4,
Keep Row 5,
and so on and so forth...

If this is not possible then I am more than willing to use pandas sample functionality in conjunction with a mask.

I'mahdi · Accepted Answer · 2022-07-24T17:31:14.150

You can use slicing with this formula like that you want [1::2] ([start:end:step]), here start from 1 and step is 2, So : 1,3,5,..... Because you have dataframe, you can use df.index[1::2]. So you can keep 1, drop 2, keep 3, drop 4, ....

(index start from zero and if you want to start from zero you can try with [::2])

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A' : np.random.randint(0,10,1000),
    'B' : np.random.randint(0,10,1000)
})
print(df)

df = df.iloc[df.index[1::2]]
print(df)

#input random df
     A  B
0    1  8
1    5  4
2    8  4
3    9  0
4    9  5
..  .. ..
995  8  9
996  4  9
997  8  4
998  2  8
999  9  0

[1000 rows x 2 columns]

# result random df
     A  B
1    5  4
3    9  0
5    6  8
7    4  1
9    6  6
..  .. ..
991  1  5
993  6  8
995  8  9
997  8  4
999  9  0

[500 rows x 2 columns]

OP wants to keep the first row, so `::2` is even simpler. This is definitely the way to go. — Mad Physicist, Jul 24 '22 at 17:30
@MadPhysicist, Thanks for your advice, I add more information in the answer. — I'mahdi, Jul 24 '22 at 17:35

Mad Physicist · Answer 2 · 2022-07-24T17:28:48.153

0

You can index with a mask like mask = (1 - np.arange(len(df)) % 2).astype(bool).

You can remove the 1 - if you're ok to start dropping with the first record instead of the second.
If you have a numerical index, you can replace np.arange(len(df)) with df.index.
You can replace ...astype(bool) with ... == 0, ... == 1, or even np.logical_not(...).

edited Jul 24 '22 at 17:28

answered Jul 24 '22 at 17:22

Mad Physicist

107,652
25
181
264

How to split dataframe in half by keeping 1 record and drop another record on all records?

2 Answers2