0

I have a dataframe of size N =~ (3Million,79). I need to make 1k dataframes of size 3,000 where each one is a random subset of the dataframe previously described. Furthermore, it is without replacement. That way I get the totality of the data but divided randomly into 1k dataframes.

Antonio López Ruiz
  • 1,396
  • 5
  • 20
  • 36

1 Answers1

2

Once you decide in how many parts n you want to split your dataframe you can just do

import pandas as pd
import numpy as np

dfs = np.array_split(df.sample(frac=1), n)
rpanai
  • 12,515
  • 2
  • 42
  • 64
  • 1
    Exactly what I needed! Thanks! – Antonio López Ruiz Nov 23 '17 at 16:20
  • Hi @rpanai, how to divide a large data frame having multiple categorical columns with multiple labels or classes in it. For example, I'm having 1million rows with 100 columns and 50 columns having categorical data with different labels in it. Now how to divide dataframe into 2 or 3 parts in which all labels in categorical columns should be present in the 2 or 3 subsets. Is it possible to do that? if it is then please suggest me or answer my question at https://stackoverflow.com/questions/69804680/how-to-split-dataframe-into-two-parts-with-all-labels-in-categorical-columns – swarna Nov 04 '21 at 14:26
  • https://stackoverflow.com/questions/69840955/how-to-divide-a-dataframe-into-several-dataframes – swarna Nov 04 '21 at 14:35