-1

I have a bunch of items that are associated with different groups, and I ultimately want to create a list for each group, containing all associated items.

The catch is I do not know how many groups there are, so how can I dynamically generate the correct number of lists, as well as how to call on them?

I am looping through item_list and group_list, two different series that allign perfectly with each other, as in item_list[item] has the corresponding group in group_list[item]

Here is some raw data:

item list   group list
   A             1
   B             1
   C             2
   D             1
   E             2
   F             1
   G             2
   H             2
   I             1
   J             2

This is what I have so far:

groups = []

for item in item_list:
    groups.append(group_list[item])

# Get only unique values (instead of having groups 1,1,1,2,2 --> 1,2)
group_set = list(set(groups))

# Number of lists that need to be generated
len(group_set)

What I want to end up with:

[IN]: print list_1:
[OUT]: ['A', 'B', 'D', 'F', 'I']

[IN]: print list_2:
[OUT]: ['C', 'E', 'G', 'H', 'J']

where list_1 and list_2 was generated because len(group_set) from my current code is equal to 2.

I'm just not sure how to dynamically generate that number of lists, and put each item in the appropriate list.

Any advice/guidance is much appreciated...

ploo
  • 667
  • 3
  • 12
  • 26
  • 1
    Have you looked at [groups](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.groups.html#pandas.core.groupby.GroupBy.groups)? Also what are you ultimately trying to do? – EdChum Apr 15 '15 at 20:06
  • In the example if we assume I have groups 1 and 2, I want to be able to create a list containing all items associated in group 1, and another list with items associated in group 2. But how can this be done for n groups? I will investigate this link you provided, thanks. – ploo Apr 15 '15 at 20:10
  • 1
    Well you can just do something like `df.groupby(col)[other_col].apply(lambda x: list(x))` – EdChum Apr 15 '15 at 20:12
  • Not sure why this was down-voted, but if I am leaving out some integral information/clarity to the question, I am happy to expand/clarify. – ploo Apr 15 '15 at 20:15
  • Thanks @EdChum I am looking into this now. If you put it in the form of an answer I'm happy to select it if it ends up solving my problem :) – ploo Apr 15 '15 at 20:16
  • I don't know why, but 2 people have voted to close because your question is unclear, normally it is good practice to provide raw input data, code and desired output. all of which should be runnable, also I don't want to post an answer unless this is really what you want – EdChum Apr 15 '15 at 20:20
  • You may to look at this which looks related: http://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – EdChum Apr 15 '15 at 20:25

2 Answers2

1

You could use a python dictionary comprehension to compile the lists you want to achieve ... the last two lines in the next code block do the heavy lifting. The rest of the code block is me getting your data into pandas.

import pandas as pd

# get your data into pandas
data = '''
item_list     group_list
A             1
B             1
C             2
D             1
E             2
F             1
G             2
H             2
I             1
J             2'''
from StringIO import StringIO # import from io for python 3
df = pd.read_csv(StringIO(data), sep=r'\s+', index_col=None, header=0)

# use a dictionary comprehension to compile the collection of lists
lists = {x: df[df['group_list'] == x].item_list.tolist() 
    for x in df['group_list'].unique()}

Which gave me the following in ipython:

In [27]: print(lists)
{1: ['A', 'B', 'D', 'F', 'I'], 2: ['C', 'E', 'G', 'H', 'J']}

In [28]: print(lists[1])
['A', 'B', 'D', 'F', 'I']

In [29]: print(lists[2])
['C', 'E', 'G', 'H', 'J']
Mark Graph
  • 4,969
  • 6
  • 25
  • 37
  • Hi Mark, this is a great solution, thank you! `tolist()` and `unique()` are perfect, especially the unique(), much cleaner than converting a list to a set and back to a list. – ploo Apr 16 '15 at 13:07
1

Or you could do something like @EdChum suggested above ...

In [11]: x = df.groupby('group_list')['item_list'].apply(lambda x: x.tolist())

In [12]: print(x)
group_list
1    [A, B, D, F, I]
2    [C, E, G, H, J]
Name: item_list, dtype: object

In [13]: print(x[1])
['A', 'B', 'D', 'F', 'I']

In [14]: print(x[2])
['C', 'E', 'G', 'H', 'J']
Mark Graph
  • 4,969
  • 6
  • 25
  • 37