How to split a dataframe based on column values, sort or groupby for string values?

Question

Completely new to coding and pandas.

df

   Date         Particulars    Inwards  Code

1 2017-04-01         EFG           12800    01
2 2017-07-22         ABC           100      01
3 2017-09-05         BCD           10000    01
4 2018-03-13         ABC           2000     01

I wanted to output 3 dataframes from this df based on the df['Particulars'] column, i.e.

Output: df1

   Date         Particulars    Inwards  Code

2 2017-07-22         ABC           100      01
4 2018-03-13         ABC           2000     01

df2

   Date         Particulars    Inwards  Code

1 2017-04-01         EFG           12800    01

df3

   Date         Particulars    Inwards  Code

3 2017-09-05         BCD           10000    01

I have a way of doing it through:

 df1 = df1.append(df.loc[df['Particulars'] == 'ABC'], ignore_index=False)

while I initialise a list of Particulars and make dataframes and then do the above command using a for loop. But I am wondering if sort or groupby would be better options? And how exactly to apply them I tried groupby and sort but can't get the dataframe.

In this case you can just do: `df1 = df[df['Particulars'] == 'ABC']` and so on. — pault, Apr 06 '18 at 16:20
@pault I am trying to avoid making a list of the unique items in 'Particulars' as its a 1000 row df and then setting up empty dataframes in the list(making a dictionary) and then looping through. :( I was hoping there was a way to split the dataframe based 'Particulars' column — Sid, Apr 06 '18 at 16:22

score 3 · Accepted Answer · answered Apr 06 '18 at 17:00

3

You can create a dictionary of data frames by grouping your df on Particulars.

d = {index: label for index, label in df.groupby('Particulars')}

Now you can access each df using

d['ABC']

    Date        Particulars Inwards Code
2   2017-07-22  ABC         100     1
4   2018-03-13  ABC         2000    1

answered Apr 06 '18 at 17:00

Vaishali

37,545
5
58
86

Nice, I forgot about this one. One question, though. Does calling `groupby` automatically mean it's O(n log n) as in the background it creates groups by sorting? I seem to remember this was the case. – jpp Apr 06 '18 at 17:04
@jpp, the complexity of loop including LC is O(n), not sure whether grouping impacts it further. Time to dog further :) – Vaishali Apr 06 '18 at 17:11

score 2 · Answer 2 · answered Apr 06 '18 at 17:00

2

A dictionary comprehension is the cleanest way to structure your data:

d = {k: df[df['Particulars'] == k] for k in df['Particulars'].unique()}

Related: How do I create a variable number of variables?

answered Apr 06 '18 at 17:00

jpp

159,742
34
281
339

How to split a dataframe based on column values, sort or groupby for string values?

2 Answers2