2

I have the following dataframe:

df = pd.DataFrame({'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 
                   'Info': ['info1', 'info2', 'info3', 'info4', 'info5', 'info6', 
                            'info7', 'info8', 'info9', 'info10', 'info11', 'info12'],
                   'Category': ['157/120/RGB', '112/54/RGB', '14/280/CMYK', '50/100/RGB',
                                '150/88/CMYK', '160/100/G', '200/450/CMYK', '65/90/RGB',
                                '111/111/G', '244/250/RGB', '100/100/CMYK', '144/100/G']})

I need to get a number of dataframes equal to the number of right-sided Category string patterns, that is RGB, CMYK, G. Is there a way - maybe using regular expressions - to put just this string piece within getgroup method in order to create these groups? For instance:

df_RGB = df.groupby('Category').getgroup('...RGB')

what should I replace dots with?

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Nate
  • 209
  • 1
  • 7

3 Answers3

3

You can try this with GroupBy.get_group here.

g = df['Category'].str.extract("/*(\w+)$").squeeze()
keys = g.unique() # if you want to see all the keys
grouped = df.groupby(g)

df_RGB = grouped.get_group('RGB')

   ID    Info     Category
0   1   info1  157/120/RGB
1   2   info2   112/54/RGB
3   4   info4   50/100/RGB
7   8   info8    65/90/RGB
9  10  info10  244/250/RGB
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
2

You can use Series.str.split with df.groupby:

In [3747]: df['actual_category'] = df.Category.str.split('/').str[-1]

In [3765]: d = {k:v.iloc[:, :-1] for k,v in df.groupby('actual_category')}

In [3766]: d
Out[3766]: 
{'CMYK':     ID    Info      Category
 2    3   info3   14/280/CMYK
 4    5   info5   150/88/CMYK
 6    7   info7  200/450/CMYK
 10  11  info11  100/100/CMYK,
 'G':     ID    Info   Category
 5    6   info6  160/100/G
 8    9   info9  111/111/G
 11  12  info12  144/100/G,
 'RGB':    ID    Info     Category
 0   1   info1  157/120/RGB
 1   2   info2   112/54/RGB
 3   4   info4   50/100/RGB
 7   8   info8    65/90/RGB
 9  10  info10  244/250/RGB}

This will give you a dict with keys as Category names and values as individual dataframes for each category.

In [3753]: df_RGB = d['RGB']

In [3754]: df_RGB
Out[3754]: 
   ID    Info     Category
0   1   info1  157/120/RGB
1   2   info2   112/54/RGB
3   4   info4   50/100/RGB
7   8   info8    65/90/RGB
9  10  info10  244/250/RGB
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
2

You can create dictionary of Dataframes by convert groupby object to dict with grouping by last values after last /:

d = dict(iter(df.groupby(df['Category'].str.split('/').str[-1])))
print (d)
{'CMYK':     ID    Info      Category
2    3   info3   14/280/CMYK
4    5   info5   150/88/CMYK
6    7   info7  200/450/CMYK
10  11  info11  100/100/CMYK, 'G':     ID    Info   Category
5    6   info6  160/100/G
8    9   info9  111/111/G
11  12  info12  144/100/G, 'RGB':    ID    Info     Category
0   1   info1  157/120/RGB
1   2   info2   112/54/RGB
3   4   info4   50/100/RGB
7   8   info8    65/90/RGB
9  10  info10  244/250/RGB}

print (d['CMYK'])
    ID    Info      Category
2    3   info3   14/280/CMYK
4    5   info5   150/88/CMYK
6    7   info7  200/450/CMYK
10  11  info11  100/100/CMYK

It is not recommended, but possible create DataFrames by groups names like:

for i, g in df.groupby(df['Category'].str.split('/').str[-1]):
    globals()['df_' + str(i)] =  g

print (df_CMYK)

    ID    Info      Category
2    3   info3   14/280/CMYK
4    5   info5   150/88/CMYK
6    7   info7  200/450/CMYK
10  11  info11  100/100/CMYK
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252