91

I have pandas DataFrame which I have composed from concat. One row consists of 96 values, I would like to split the DataFrame from the value 72.

So that the first 72 values of a row are stored in Dataframe1, and the next 24 values of a row in Dataframe2.

I create my DF as follows:

temps = DataFrame(myData)
datasX = concat(
[temps.shift(72), temps.shift(71), temps.shift(70), temps.shift(69), temps.shift(68), temps.shift(67),
 temps.shift(66), temps.shift(65), temps.shift(64), temps.shift(63), temps.shift(62), temps.shift(61),
 temps.shift(60), temps.shift(59), temps.shift(58), temps.shift(57), temps.shift(56), temps.shift(55),
 temps.shift(54), temps.shift(53), temps.shift(52), temps.shift(51), temps.shift(50), temps.shift(49),
 temps.shift(48), temps.shift(47), temps.shift(46), temps.shift(45), temps.shift(44), temps.shift(43),
 temps.shift(42), temps.shift(41), temps.shift(40), temps.shift(39), temps.shift(38), temps.shift(37),
 temps.shift(36), temps.shift(35), temps.shift(34), temps.shift(33), temps.shift(32), temps.shift(31),
 temps.shift(30), temps.shift(29), temps.shift(28), temps.shift(27), temps.shift(26), temps.shift(25),
 temps.shift(24), temps.shift(23), temps.shift(22), temps.shift(21), temps.shift(20), temps.shift(19),
 temps.shift(18), temps.shift(17), temps.shift(16), temps.shift(15), temps.shift(14), temps.shift(13),
 temps.shift(12), temps.shift(11), temps.shift(10), temps.shift(9), temps.shift(8), temps.shift(7),
 temps.shift(6), temps.shift(5), temps.shift(4), temps.shift(3), temps.shift(2), temps.shift(1), temps,
 temps.shift(-1), temps.shift(-2), temps.shift(-3), temps.shift(-4), temps.shift(-5), temps.shift(-6),
 temps.shift(-7), temps.shift(-8), temps.shift(-9), temps.shift(-10), temps.shift(-11), temps.shift(-12),
 temps.shift(-13), temps.shift(-14), temps.shift(-15), temps.shift(-16), temps.shift(-17), temps.shift(-18),
 temps.shift(-19), temps.shift(-20), temps.shift(-21), temps.shift(-22), temps.shift(-23)], axis=1)

Question is: How can split them? :)

Mymozaaa
  • 474
  • 4
  • 20
Warry S.
  • 1,203
  • 2
  • 9
  • 9

3 Answers3

125

iloc

df1 = datasX.iloc[:, :72]
df2 = datasX.iloc[:, 72:]

(iloc docs)

zabop
  • 6,750
  • 3
  • 39
  • 84
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 3
    @piRSquared, If I have to select rows can i just use `df1 = datasX[0:10]` and `df2 = datasX[10:]` ? So, this will give df1 with 10 rows and df2 with the rest? – i.n.n.m Jul 28 '17 at 15:42
  • 4
    @i.n.n.m yes. However, I personally don't like using that type of slicing because it's confusing. I use `df[['col1', 'col2]]` to slice columns. Just because pandas infers the `0:10` to be index positions, doesn't justify using that syntax... for me. This is totally my opinion and you can do what you need to. But I'd use `df.iloc[:10]` and `df.iloc[10:]` – piRSquared Jul 28 '17 at 15:47
  • @piRSquared I agree on selecting columns would be great to use how you have mentioned. My question was for rows. Does that apply to selecting rows too? – i.n.n.m Jul 28 '17 at 15:52
  • 1
    @i.n.n.m only mentioned columns to highlight that you asked if `datasX[0:10]` would work to slice rows. Turns out it will, **but** this syntax looks too much like column slicing. What you suggested will work fine. But I'd use `iloc` to make it more obvious. So, I'd use `df.iloc[:10]` instead of `df[:10]` – piRSquared Jul 28 '17 at 15:55
  • What if the dataframe has not 144 rows? Then your method does not work. – Soerendip Oct 22 '18 at 23:37
  • @Sören this has nothing to do with 144 rows. Why do you assume that? – piRSquared Oct 23 '18 at 00:15
  • Because you put in 72. – Soerendip Oct 23 '18 at 00:49
  • @Sören this splits at the 72nd column. All columns before the 72nd into one dataframe. Everything from the 72nd column on to the second dataframe – piRSquared Oct 23 '18 at 02:33
  • 3
    Yes, you did everything right. I miss-interpreted the question. The question should be rephrased to "How to split a Dataframe at the 72th column?" – Soerendip Oct 23 '18 at 18:56
  • Note [this comment](https://stackoverflow.com/questions/41624241/pandas-split-dataframe-into-two-dataframes-at-a-specific-row#comment110032448_41624241). If you were trying to split on rows it would just be `df[:72]`. – young_souvlaki Aug 30 '21 at 00:04
  • I don't see any documentation for iloc taking in two params from the pandas docs you linked. How did you figure this functionality? – young_souvlaki Aug 30 '21 at 00:05
82

use np.split(..., axis=1):

Demo:

In [255]: df = pd.DataFrame(np.random.rand(5, 6), columns=list('abcdef'))

In [256]: df
Out[256]:
          a         b         c         d         e         f
0  0.823638  0.767999  0.460358  0.034578  0.592420  0.776803
1  0.344320  0.754412  0.274944  0.545039  0.031752  0.784564
2  0.238826  0.610893  0.861127  0.189441  0.294646  0.557034
3  0.478562  0.571750  0.116209  0.534039  0.869545  0.855520
4  0.130601  0.678583  0.157052  0.899672  0.093976  0.268974

In [257]: dfs = np.split(df, [4], axis=1)

In [258]: dfs[0]
Out[258]:
          a         b         c         d
0  0.823638  0.767999  0.460358  0.034578
1  0.344320  0.754412  0.274944  0.545039
2  0.238826  0.610893  0.861127  0.189441
3  0.478562  0.571750  0.116209  0.534039
4  0.130601  0.678583  0.157052  0.899672

In [259]: dfs[1]
Out[259]:
          e         f
0  0.592420  0.776803
1  0.031752  0.784564
2  0.294646  0.557034
3  0.869545  0.855520
4  0.093976  0.268974

np.split() is pretty flexible - let's split an original DF into 3 DFs at columns with indexes [2,3]:

In [260]: dfs = np.split(df, [2,3], axis=1)

In [261]: dfs[0]
Out[261]:
          a         b
0  0.823638  0.767999
1  0.344320  0.754412
2  0.238826  0.610893
3  0.478562  0.571750
4  0.130601  0.678583

In [262]: dfs[1]
Out[262]:
          c
0  0.460358
1  0.274944
2  0.861127
3  0.116209
4  0.157052

In [263]: dfs[2]
Out[263]:
          d         e         f
0  0.034578  0.592420  0.776803
1  0.545039  0.031752  0.784564
2  0.189441  0.294646  0.557034
3  0.534039  0.869545  0.855520
4  0.899672  0.093976  0.268974
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 1
    Thanks for your help, i have iloc now and it worked aswell, but for the future its nice to know! :) – Warry S. Jan 12 '17 at 22:49
18

I generally use array split because it's easier simple syntax and scales better with more than 2 partitions.

import numpy as np
partitions = 2
dfs = np.array_split(df, partitions)

np.split(df, [100,200,300], axis=0] wants explicit index numbers which may or may not be desirable.

Leo Ufimtsev
  • 6,240
  • 5
  • 40
  • 48