How to split pandas dataframe into list of dataframes by id?

Question

I have a big pandas dataframe (about 150000 rows). I have tried method groupby('id') but in returns group tuples. I need just a list of dataframes, and then I convert them into np array batches to put into an autoencoder (like this https://www.datacamp.com/community/tutorials/autoencoder-keras-tutorial but 1D)

So I have a pandas dataset :

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'John', 'John', 'John', 'Krish'], 'Age': [20, 21, 19, 18, 18, 18, 18, 18],'id': [1, 1, 2, 2, 3, 3, 3, 3]}  
# Create DataFrame  
df = pd.DataFrame(data)  
# Print the output.  
df.head(10)

I need the same output (just a list of pandas dataframe). Also, i need a list of unsorted lists, it is important, because its time series.

data1 = {'Name': ['Tom', 'Joseph'], 'Age': [20, 21],'id': [1, 1]}  
data2 = {'Name': ['Krish', 'John', ], 'Age': [19, 18, ],'id': [2, 2]}  
data3 = {'Name': ['John', 'John', 'John', 'Krish'], 'Age': [18, 18, 18, 18],'id': [3, 3, 3, 3]}  
pd_1 = pd.DataFrame(data1)
pd_2 = pd.DataFrame(data2)
pd_3 = pd.DataFrame(data3)
array_list = [pd_1,pd_2,pd_3]
array_list

How can I split dataframe ?

Nk03 · Answer 1 · 2021-05-12T08:56:40.987

6

Or you can TRY:

array_list = df.groupby(df.id.values).agg(list).to_dict('records')

Output:

[{'Name': ['Tom', 'Joseph'], 'Age': [20, 21], 'id': [1, 1]},
 {'Name': ['Krish', 'John'], 'Age': [19, 18], 'id': [2, 2]},
 {'Name': ['John', 'John', 'John', 'Krish'],
  'Age': [18, 18, 18, 18],
  'id': [3, 3, 3, 3]}]

UPDATE:

If you need a dataframe list:

df_list = [g for _,g in df.groupby('id')]
#OR
df_list = [pd.DataFrame(i) for i in df.groupby(df.id.values).agg(list).to_dict('records')]

To reset the index of each dataframe:

df_list = [g.reset_index(drop=True) for _,g in df.groupby('id')]

edited May 12 '21 at 08:56

answered May 12 '21 at 08:01

Nk03

14,699
2
8
22

`to_dict("records")` and `groupby("id")` – Mark Wang May 12 '21 at 08:07
sorry, I need a list of pandas dataframes, not a list of dicts. I changed the question.... how to do that ? – Семен Немытов May 12 '21 at 08:40

score 3 · Answer 2 · answered May 12 '21 at 07:59

3

Let us group on id and using to_dict with orientation list prepare records per id

[g.to_dict('list') for _, g in df.groupby('id', sort=False)]

[{'Name': ['Tom', 'Joseph'], 'Age': [20, 21], 'id': [1, 1]},
 {'Name': ['Krish', 'John'], 'Age': [19, 18], 'id': [2, 2]},
 {'Name': ['John', 'John', 'John', 'Krish'], 'Age': [18, 18, 18, 18], 'id': [3, 3, 3, 3]}]

answered May 12 '21 at 07:59

Shubham Sharma

68,127
6
24
53

sorry, I need a list of pandas dataframes, not a list of dicts. I changed the question.... how to do that ? – Семен Немытов May 12 '21 at 08:41
1

@СеменНемытов Check `[g.reset_index(drop=True) for _, g in df.groupby('id', sort=False)]` – Shubham Sharma May 12 '21 at 08:43

score 1 · Answer 3 · answered May 12 '21 at 08:02

1

I am not sure about your need but does something like this works for you?

df = df.set_index("id")
[df.loc[i].to_dict("list") for i in df.index.unique()]

or if you really want to keep your index in your list:

[df.query(f"id == {i}").to_dict("list") for i in df.id.unique()]

answered May 12 '21 at 08:02

ClementWalter

4,814
1
32
54

sorry, I need a list of pandas dataframes, not a list of dicts. I changed the question.... how to do that ? – – Семен Немытов May 12 '21 at 08:40

Jeanot · Answer 4 · 2021-05-12T08:41:12.357

If you want to create new DataFrames storing the values:

(Previous answers are more relevant if you want to create a list) This can be solved by iterating over each id using a for loop and create a new dataframe every loop. I refer you to #40498463 and the other answers for the usage of the groupby() function. Please note that I have changed the name of the id column to Id.

for Id, df in df.groupby("Id"):
    str1 = "df"
    str2 = str(Id)
    new_name = str1 + str2
    exec('{} = pd.DataFrame(df)'.format(new_name))

Output:

df1
     Name  Age  Id
0     Tom   20   1
1  Joseph   21   1

df2
    Name  Age  Id
2  Krish   19   2
3   John   18   2

df3
    Name  Age  Id
4   John   18   3
5   John   18   3
6   John   18   3
7  Krish   18   3

sorry, I need a list of pandas dataframes, not a list of dicts. I changed question.... — Семен Немытов, May 12 '21 at 08:40

How to split pandas dataframe into list of dataframes by id?

4 Answers4