0

I have a df I'd like to split into 5 (named df1 - df5) based on the value of one column (origin). I've tried groupby, and a few other things (like this and this) with no success.

My df looks like this

     origin t_id    Group   ids            ...
0    g2     300     group2  23, 54, 24     ...
1    g      300     group2  1, 89          ...
2    g3     300     group10 155, 4, 90     ...
3    g5     300     group11 38, 13, 45.    ...
4    g4     300     group2  2.             ...

Right now I have it broken up into multiple .loc statements for each unique value of origin, but there must be a cleaner, more concise way to do this.

busybear
  • 10,194
  • 1
  • 25
  • 42
LMGagne
  • 1,636
  • 6
  • 24
  • 47
  • It's hard to help with no illustration. Have a look at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and then provide a sample of your data with the expected output – Alexandre B. Aug 15 '19 at 22:55
  • @AlexandreB. done – LMGagne Aug 15 '19 at 23:12
  • What is your expected output from the table above? – Onyambu Aug 16 '19 at 00:00

1 Answers1

2

This should do


a = []

for value in df['origin'].unique():
    a.append(df[df['origin']==value])

The array will contain the dataframes corresponding to the unique values.Let me know if I misunderstood anything.

Parijat Bhatt
  • 664
  • 4
  • 6
  • This returns `TypeError: 'method' object is not iterable` – LMGagne Aug 16 '19 at 12:57
  • Please try again. unique was supposed to be called as a function – Parijat Bhatt Aug 16 '19 at 16:51
  • that seems to run, but then `a` just prints all the data, not a list of df names. – LMGagne Aug 16 '19 at 17:20
  • To access one particular dataframe, you will have to do a[i] – Parijat Bhatt Aug 16 '19 at 17:25
  • is there a way to automatically name them in the same block of code. For example, df1-df5? – LMGagne Aug 16 '19 at 17:34
  • It would be possible if you can create python objects dynamically. I don't see the use of it. You can always index them using the array. Another way to do this would be, using a dictionary. You could use 'df1','df2' as keys in the dict and generate the keys dynamically but it's important to keep in mind that the point of using pandas is to leverage parallelization which you lose if you start using dictionaries. – Parijat Bhatt Aug 16 '19 at 17:41