2

How can I split a dataframe

import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'first':np.random.rand(4),'second':np.random.rand(4)},index=['foo','bar','baz','bat'])
print(df)
        first    second
foo  0.548814  0.423655
bar  0.715189  0.645894
baz  0.602763  0.437587
bat  0.544883  0.891773

into the two following disjoint data frames

        first    second
foo  0.548814  0.423655
bar  0.715189  0.645894

     first    second
baz  0.602763  0.437587
bat  0.544883  0.891773

by using the indices for the first data frame?

I am specifically looking for a method like

subDf1,subDf2 = pd.split(df,['foo','bar'])

where

print(subDf1)
   first    second
foo  0.548814  0.423655
bar  0.715189  0.645894

and

print(subDf2 )
  first    second
baz  0.602763  0.437587
bat  0.544883  0.891773
Oblomov
  • 8,953
  • 22
  • 60
  • 106
  • 1
    I think your expected subDf1 output is incorrect since it is identical to subDf2 – Karan Shishoo Oct 21 '19 at 06:27
  • Possible duplicate of [Pandas Split Dataframe into two Dataframes](https://stackoverflow.com/questions/41624241/pandas-split-dataframe-into-two-dataframes) – gosuto Oct 21 '19 at 06:27

1 Answers1

1

I believe you can use Index.isin with boolean indexing for second DataFrame:

idx = ['foo','bar']

print (df.loc[idx])
        first    second
foo  0.548814  0.423655
bar  0.715189  0.645894

print (df[~df.index.isin(idx)])
        first    second
baz  0.602763  0.437587
bat  0.544883  0.891773

Or Index.difference with DataFrame.loc for select by labels:

print (df.loc[df.index.difference(idx)])
        first    second
bat  0.544883  0.891773
baz  0.602763  0.437587
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252