Randomly select a percentage of columns and rows in a dataframe (Pandas, Python 3)

Question

I am trying to randomly select a certain percentage of rows and columns in my dataframe and fit these features into a logistic regression over 10 iterations. My dependent variable is whether a team won (1) or lost (0).

If I have a df that looks something like this (data is made up):

 Won    Field    Injuries   Weather    Fouls    Players
  1       2         3         1          2         8
  0       3         2         0          1         5
  1       4         5         3          2         6
  1       3         2         1          4         5 
  0       2         3         0          1         6
  1       4         2         0          2         8
  ...

For example, let's say I want to select 50% (but this could change). I want to randomly select 50% (or the closest amount to 50% if its an odd number) of the columns (field,injuries,weather,fouls,players) and 50% of the rows in those columns to place in my model.

Here is my current code which right now runs by selecting all of the columns and rows and fitting it into my model but I would like to dictate a random percentage:

z = []
For i in range(10):
    train_cols = df.columns[1:] 
    logit = sm.Logit(df['Won'], df[train_cols])
    result = logit.fit()
    exp = np.exp(result.params)
    z.append([i, exp])

Hi Andy Hayden -- I see the overlap in these questions so I appreciate the reference. One quick point: my question also deals with taking random columns and not just the entire data frame — user3682157, Oct 14 '15 at 05:56
Generate indicies with np.random.choice(np.arange(m), (k*m,), replace=False) Where m - number of features/samples, m - number in range [0,1], e.g. for 50% - 0.5. — Ibraim Ganiev, Oct 14 '15 at 17:08
Hi Olologin -- thank you for your feedback! I actually figured out the answer which is why I would like to reopen this question and post how to randomly select BOTH rows and columns :) — user3682157, Oct 14 '15 at 17:26
@user3682157, I think the question is open now, you can post your answer. — IanS, Oct 29 '15 at 16:49

Randomly select a percentage of columns and rows in a dataframe (Pandas, Python 3)

0 Answers0