1

I have a dataframe and need to break it into 2 equal dataframes.

1st dataframe would contain top half rows and 2nd would contain the remaining rows.

Please help how to achieve this using python.

Also in both the even rows scenario and odd rows scenario (as in odd rows I would need to drop the last row to make it equal).

enter image description here

enter image description here

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Kay
  • 23
  • 1
  • 7
  • Please share a sample data input with expected output for both cases. – Mayank Porwal Dec 31 '20 at 06:42
  • suppose if I have the following ```DataFrame``` ``` lst = ["Geeks","For", "Geeks", "is", "portal", "for", "Geeks","Vinay"] df = pd.DataFrame(lst) print(df) ``` I can take it's equal upper half by using ```head()``` command like follow ``` df.head(len(df)//2) ``` And below half can be accessed by using ```tail()``` command ``` df.tail(len(df)//2) ``` – Vinay Kumar Dec 31 '20 at 06:53
  • @MayankPorwal I have added the sample data and expected outcomes for both cases. Notice in odd rows case, the last row has been excluded.. – Kay Dec 31 '20 at 06:54
  • @Kay Check my comment I think it will help you. – Vinay Kumar Dec 31 '20 at 06:55
  • @VinayKumarShukla it doesn't account for the odd number of rows case. – Kay Dec 31 '20 at 06:57
  • @Kay Please check my answer. – Mayank Porwal Dec 31 '20 at 07:14

2 Answers2

3

Consider df:

In [122]: df
Out[122]: 
    id  days  sold  days_lag
0    1     1     1         0
1    1     3     0         2
2    1     3     1         2
3    1     8     1         5
4    1     8     1         5
5    1     8     0         5
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3
12   3     4     5         6

Use numpy.array_split():

In [127]: import numpy as np

In [128]: def split_df(df):
     ...:     if len(df) % 2 != 0:  # Handling `df` with `odd` number of rows
     ...:         df = df.iloc[:-1, :]
     ...:     df1, df2 =  np.array_split(df, 2)
     ...:     return df1, df2
     ...: 

In [130]: df1, df2 = split_df(df)

In [131]: df1
Out[131]: 
   id  days  sold  days_lag
0   1     1     1         0
1   1     3     0         2
2   1     3     1         2
3   1     8     1         5
4   1     8     1         5
5   1     8     0         5

In [133]: df2
Out[133]: 
    id  days  sold  days_lag
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
1

with a simple eg. you can try as below:

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20],['Jerry',25]]
#data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20]]
data1 = data[0:int(len(data)/2)]
if (len(data) % 2) == 0:
    data2 = data[int(len(data)/2):]
else:
    data2 = data[int(len(data)/2):-1]

df1 = pd.DataFrame(data1, columns=['Name', 'Age'], dtype=float); print("1st half:\n",df1)
df2 = pd.DataFrame(data2, columns=['Name', 'Age'], dtype=float); print("2nd Half:\n",df2)

Output:

D:\Python>python temp.py

1st half:
    Name   Age
 0  Alex  10.0
 1   Bob  12.0
2nd Half:
    Name   Age
 0  Clarke  13.0
 1     Tom  20.0
mukund ghode
  • 252
  • 1
  • 7