How to divide dataframe into 2 equal parts (first half rows and second half rows) - in Python

Question

I have a dataframe and need to break it into 2 equal dataframes.

1st dataframe would contain top half rows and 2nd would contain the remaining rows.

Please help how to achieve this using python.

Also in both the even rows scenario and odd rows scenario (as in odd rows I would need to drop the last row to make it equal).

Please share a sample data input with expected output for both cases. — Mayank Porwal, Dec 31 '20 at 06:42
suppose if I have the following ```DataFrame``` ``` lst = ["Geeks","For", "Geeks", "is", "portal", "for", "Geeks","Vinay"] df = pd.DataFrame(lst) print(df) ``` I can take it's equal upper half by using ```head()``` command like follow ``` df.head(len(df)//2) ``` And below half can be accessed by using ```tail()``` command ``` df.tail(len(df)//2) ``` — Vinay Kumar, Dec 31 '20 at 06:53
@MayankPorwal I have added the sample data and expected outcomes for both cases. Notice in odd rows case, the last row has been excluded.. — Kay, Dec 31 '20 at 06:54
@VinayKumarShukla it doesn't account for the odd number of rows case. — Kay, Dec 31 '20 at 06:57

score 3 · Accepted Answer · answered Dec 31 '20 at 07:11

Consider df:

In [122]: df
Out[122]: 
    id  days  sold  days_lag
0    1     1     1         0
1    1     3     0         2
2    1     3     1         2
3    1     8     1         5
4    1     8     1         5
5    1     8     0         5
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3
12   3     4     5         6

Use numpy.array_split():

In [127]: import numpy as np

In [128]: def split_df(df):
     ...:     if len(df) % 2 != 0:  # Handling `df` with `odd` number of rows
     ...:         df = df.iloc[:-1, :]
     ...:     df1, df2 =  np.array_split(df, 2)
     ...:     return df1, df2
     ...: 

In [130]: df1, df2 = split_df(df)

In [131]: df1
Out[131]: 
   id  days  sold  days_lag
0   1     1     1         0
1   1     3     0         2
2   1     3     1         2
3   1     8     1         5
4   1     8     1         5
5   1     8     0         5

In [133]: df2
Out[133]: 
    id  days  sold  days_lag
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3

mukund ghode · Answer 2 · 2020-12-31T07:22:10.903

with a simple eg. you can try as below:

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20],['Jerry',25]]
#data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20]]
data1 = data[0:int(len(data)/2)]
if (len(data) % 2) == 0:
    data2 = data[int(len(data)/2):]
else:
    data2 = data[int(len(data)/2):-1]

df1 = pd.DataFrame(data1, columns=['Name', 'Age'], dtype=float); print("1st half:\n",df1)
df2 = pd.DataFrame(data2, columns=['Name', 'Age'], dtype=float); print("2nd Half:\n",df2)

Output:

D:\Python>python temp.py

1st half:
    Name   Age
 0  Alex  10.0
 1   Bob  12.0
2nd Half:
    Name   Age
 0  Clarke  13.0
 1     Tom  20.0

How to divide dataframe into 2 equal parts (first half rows and second half rows) - in Python

2 Answers2