0

I have pandas data frame and want to create 4 other data frames from it, based on distinct values in one column (Let say column A in df). The shape of the data frame called df is (77572, 36). I have a total of 1714 distinct values in column A. The end result should give a total of 4 data frames, with shapes (500, 36).

If we use df['A'].unique() or df.groupby['A'] we get the desirable result structure, but the type of the result is pandas.core.groupby.generic.DataFrameGroupBy for groupby() method, thus make it imposible to furter slice the df like for example:

df1 = df.iloc[:500,:]
df2 = df.iloc[501:1000,:]
df3 = df.iloc[1001:1500,:]
df4 = df.iloc[1501:,:]

How to generate new pandas DataFrame object with shape (1714, 36) - 1714 being distinct values in column A, and make 4 additional data frames from it?

  • Can you please explain the logic by which you want to split the dataframe? Do you wish to create a DataFrame for Each unique value of column 'A'? – Ofek Glick Aug 30 '21 at 11:04
  • @OfekGlick thanks for your comment. I have a data frame called df. It has a total of 77572 values (rows). It has a total of 1714 distinct values in column A. After I group by column A I get the result as one desirable data frame which I want to further split into 4 data frames using df.iloc[] like explained above. The issue is that when I use group by, I get the desired result but with the type `pandas.core.groupby.generic.DataFrameGroupBy` which is not a data frame. – Igor Ristovski Aug 30 '21 at 12:48
  • I'm not sure `groupby` is what you're looking for, as it is expected to be followed by some aggregation function, you can read more about it [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html). If you're looking to just create a DataFrame for each unique element in your dataframe I have posted a possible solution. – Ofek Glick Aug 30 '21 at 13:31
  • Yes, it might be the case @OfekGlick. The final goal is to create 4 additional data frames, using the original data frame df, but for DISTINCT values in column A. That is why my approach was to first get DISTINCT values using `df.groupby('A')` and then from the result, create the desired 4 new data frames. – Igor Ristovski Aug 30 '21 at 14:20

0 Answers0