1

New to pandas so sorry if this is old hat. What I'm trying to accomplish is similar to what is contained in grouping rows in list in pandas groupby, but I have more than two columns and can't figure out how to get all of my columns displayed along with the grouped value. Here's what I'm trying to do.

data = [{'ip': '192.168.1.1', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.3', 'make': 'Dell', 'model': 'MiniServ'},
{'ip': '192.168.1.5', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.6', 'make': 'HP', 'model': 'Thinger3000'},
{'ip': '192.168.1.8', 'make': 'HP', 'model': 'Thinger3000'}]

In [2]: df = pd.DataFrame(data)
In [3]: df
Out[4]:
            ip  make          model
0  192.168.1.1  Dell  UltraServ9000
1  192.168.1.3  Dell       MiniServ
2  192.168.1.5  Dell  UltraServ9000
3  192.168.1.6    HP    Thinger3000
4  192.168.1.8    HP    Thinger3000    

<magic>

Out[?]:    
            ip               make           model
0  192.168.1.1, 192.168.1.5  Dell   UltraServ9000
1  192.168.1.3               Dell        MiniServ
3  192.168.1.6, 192.168.1.8  HP       Thinger3000

Thanks in advance :)

liam
  • 13
  • 1
  • 5
  • Sorry but this really is a dupe of this: http://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby, basically all you want is `In [107]: df.groupby(['a','c'])['b'].apply(list) Out[107]: a c A foo [1, 2] B foo [5, 5, 4] C foo [6] Name: b, dtype: object` – EdChum Jan 25 '17 at 16:12
  • Welcome to stack overflow. In the future, please create the raw data so that users can experiment with your data - you'll get better/faster answers that way. Also, 'a' and 'A' aren't great to use as both column names and data values if you're building examples. – flyingmeatball Jan 25 '17 at 16:14
  • Hi - it's very nearly a duplicate, but I'm running into issues as I have 20-some-odd columns, so what I'm trying to do is a groupby on "A", list in "B", and then just show the values from all the other columns, which are identical for each row. – liam Jan 25 '17 at 17:58

1 Answers1

3

groupby takes a parameter, by, through which you can specify a list of variables you want to operate your groupby over. So the answer of that question is modified as follows:

df.groupby(by = ["a", "c"])["b"].apply(list).reset_index()

EDIT: Looking at your comment: since all columns other than a have the same values, you can list them easily in the by parameter because they won't affect the result. To save you time and prevent you to actually type all the names you could do something like this:

df.groupby(by = list(set(df.columns) - set(["b"])))["b"].apply(list).reset_index()

Alternatively, you could exploit the agg function by passing a dictionary which for all columns will take the max and for b will return the list:

aggregate_functions = {x: max for x in df.columns if x != "a" and x != "b"}
aggregate_functions["b"] = lambda x: list(x)
df.groupby(by = "a").agg(aggregate_functions)

Which you prefer is up to you, probably the latter is more readable.

Mikk
  • 804
  • 8
  • 23
  • Thanks! This is close to what I'm looking for and very similar to the almost-duplicate I linked above. I should have specified that I don't just have three columns (A, B, C), I have 20. I'm trying to group by one, list in the other, then just display the rest of the columns as is as they're identical for each row. – liam Jan 25 '17 at 18:05
  • Completely my fault. I'm still trying to wrap my head around pandas and am clearly losing that battle. I've updated my example above to hopefully be more useful. Sorry for the confusion! – liam Jan 25 '17 at 19:33
  • Sorry for the delayed response and thank you so much! Your edit did it for me! – liam Feb 16 '17 at 15:35