I have a dataframe of size N =~ (3Million,79)
. I need to make 1k dataframes of size 3,000 where each one is a random subset of the dataframe previously described. Furthermore, it is without replacement. That way I get the totality of the data but divided randomly into 1k dataframes.
Asked
Active
Viewed 46 times
0

Antonio López Ruiz
- 1,396
- 5
- 20
- 36
-
https://stackoverflow.com/questions/38250710/how-to-split-data-into-3-sets-train-validation-and-test – BENY Nov 23 '17 at 16:06
-
What's your specific question? – Jan Trienes Nov 23 '17 at 16:08
-
Sorry, I published it by mistake without finishint the edit. @Wen, n is aprox 10k, therefore that question isn't helpfull, already tried it but thanks – Antonio López Ruiz Nov 23 '17 at 16:13
-
@AntonioLópezRuiz should think one more steps. You will get the result – BENY Nov 23 '17 at 16:14
1 Answers
2
Once you decide in how many parts n
you want to split your dataframe you can just do
import pandas as pd
import numpy as np
dfs = np.array_split(df.sample(frac=1), n)

rpanai
- 12,515
- 2
- 42
- 64
-
1
-
Hi @rpanai, how to divide a large data frame having multiple categorical columns with multiple labels or classes in it. For example, I'm having 1million rows with 100 columns and 50 columns having categorical data with different labels in it. Now how to divide dataframe into 2 or 3 parts in which all labels in categorical columns should be present in the 2 or 3 subsets. Is it possible to do that? if it is then please suggest me or answer my question at https://stackoverflow.com/questions/69804680/how-to-split-dataframe-into-two-parts-with-all-labels-in-categorical-columns – swarna Nov 04 '21 at 14:26
-
https://stackoverflow.com/questions/69840955/how-to-divide-a-dataframe-into-several-dataframes – swarna Nov 04 '21 at 14:35