1

I am trying to find a better way to separate the first half of my dataframe which has a variable number of columns. I have tried using both the iloc and ix methods, but effectively I am writing the following for many dataframes. Is there a better way to shorthand this?

df.iloc[:, [0,1,2,3,4,5,6,7,8,9,10,11]] #df.ix works this way as well

What I'd like to do is below...

df.iloc[:, [0:df.shape[1]/2] #this will allow column number flexibility

Do any of you have an idea of a good workaround for this?

MattR
  • 4,887
  • 9
  • 40
  • 67
Tyler Russell
  • 653
  • 3
  • 10
  • 26
  • I don't understand what the problem is. There is a syntax error there but other than that `df.iloc[:, :df.shape[1]/2]` should be valid? – ayhan Dec 06 '17 at 17:06
  • 1
    `df.iloc[:, 0:int(df.shape[1]/2)]` – BENY Dec 06 '17 at 17:21
  • 3
    Do not use .ix it is deprecated. See this [SO post](https://stackoverflow.com/a/46915810/6361531) – Scott Boston Dec 06 '17 at 17:25
  • @Wen, thanks, that seemed to fix it. What made you think to use the int()? Another thing I just learned is that df.iloc[:, [0:int(df.shape[1]/2]] doesn't work either. Ayhan, thanks for noticing the syntax error. – Tyler Russell Dec 07 '17 at 19:25
  • 1
    @TylerRussell int divided will return the float , then you are using the position selection , which is require int – BENY Dec 07 '17 at 19:31
  • @Wen, thanks so much! That makes sense. Appreciate your quick replies. It was driving me insane yesterday. – Tyler Russell Dec 07 '17 at 19:32
  • @TylerRussell Yw~ :-) – BENY Dec 07 '17 at 19:33

1 Answers1

1

Like Scott mentioned in his comment, do not use ix because it is deprecated. Although ix works for now, it may not in the future, use iloc instead.

However, try using array_split() from numpy. It is very readable. This will split the dataframe in half evenly (array_split will allow an uneven number and return as close to half as possible):

import numpy as np

df_split = np.array_split(df, 2)
# This will return a list of dataframes.
# if you need single datframes you could always use df_split[0] for first half
# and df_split[1] for other half

If you needed to split the columns as well you could also do:

df_split = np.array_split(df.columns, 2) # <--- Notice the df.columns in the argument
first_half = df[df_split[0].tolist()]
second_half = df[df_split[1].tolist()]
MattR
  • 4,887
  • 9
  • 40
  • 67