0

I'm quite new to pandas so possibly doing some silly things, but I'm trying to somehow aggregate data in numpy arrays with pandas. Below is my incomplete attempt (Python 3.8).

import numpy as np
import pandas as pd

input = np.array([20, 40, 48, 42, 25]) # unsorted 1-dimensional array

dataframe = pd.DataFrame({"v":input}).sort_values("v")
"""
dataframe is:
    v
0  20
4  25
1  40
3  42
2  48
"""
dataframe["group"] = dataframe.diff().gt(5).cumsum()
"""
dataframe is:
    v  group
0  20      0
4  25      0
1  40      1
3  42      1
2  48      2
"""
result = dataframe.???????

What I want to get as result is something like:

{0: [0, 4], 1: [1, 3], 2:[2]}
[[0, 4], [1, 3], [2]]

Of course it will be welcome if you can do the equivalent without pandas.

  • 2
    `df.reset_index().groupby('group')['index'].agg(list).to_dict()` – mozway Aug 21 '23 at 08:11
  • 1
    @mozway, great answer. in the contect of the question, you meant this: `dataframe.reset_index().groupby('group')['index'].agg(list).to_dict()`... – D.L Aug 21 '23 at 08:23
  • @D.L Thanks, it worked beautifully! Do you know where I can find the source that the leftmost column can be referred to with `index` (while it looks obvious...) and why you can still point to the old index after `reset_index()`? – broccoli forest Aug 22 '23 at 03:19
  • 1
    @broccoliforest, the user guide and reference guides are here (https://pandas.pydata.org/docs). And the `reset_index part` is here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html?highlight=reset_index – D.L Aug 22 '23 at 05:33
  • @D.L Ah, so `reset_index()` generates the `index` column. I was thinking the wrong way. Thank you! – broccoli forest Aug 23 '23 at 04:57

0 Answers0